Research:Implications of ChatGPT for knowledge integrity on Wikipedia

Created
06:07, 12 July 2023 (UTC)
Duration:  2023-July – 2024-June
ChatGPT, knowledge integrity, misinformation, AI, large language models
Grant ID: G-RS-2303-12076

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


Generative AI, including large language models (LLMs) like ChatGPT and multimodal models (MMMs) have captured global attention, promising great advances to the distribution of information. Some in the Wikimedia community have identified the possibilities of generative AI for editing and other tasks: for example, generating article drafts, summarising sources, producing transcriptions of video and more easily querying Wikidata content.[1][2] Others have highlighted the risk of generative AI, including polluting Wikipedia with swathes of AI-generated content or producing automated comments that simulate the appearance of discussion, debate and consensus. These may make the job of maintaining quality content difficult, violate core principles and undermine Wikipedia's peer-production system.[1][2]

The aim of this project was to explore the implications of generative AI for knowledge integrity on Wikipedia and to investigate the extent to which Wikipedia rules, practices and policies can maximise the benefits as well as address any risks.

Knowledge integrity was a Wikimedia Foundation program established in 2019 that sought to investigate how Wikimedia projects could serve as a foundation for a broader, linked infrastructure of trust to serve the entire internet, in a vision of "knowledge as a service."[3] As a concept, rather than an internal program, knowledge integrity is essential to the ability of Wikipedia and associated projects to fulfill its mission as an encyclopedia. The objective of Its core content policies and editorial practices as a whole can be understood at promoting and protecting the integrity of the knowledge it contains.

Rooted in what Wikimedians are saying about the implications of generative AI, and applied in the context of the Wikimedia Foundation's knowledge integrity framework, this project maps out the most important areas for possible policy intervention and adjustment of current practice to deal with possible risks to Wikimedia and the larger internet infrastructure. This work supports the 2030 Strategic Direction in its aim to ensure that Wikimedia constitutes essential infrastructure of the ecosystem of free knowledge.[4]

Our project sought to answer the following questions:

  • RQ1: Does the potential use of generative AI threaten knowledge integrity on Wikipedia? If so, how?
  • RQ2: What steps could be taken (in terms of on-platform governance or other measures) to address any risks to knowledge integrity?

Our data included the analysis of hundreds of thousands of words from on-wiki discussions between November 2022 and November 2023. We supplemented this with 15 in-depth interviews with Wikimedians at the 2023 Wikimania Singapore conference and online.

The result of these discussions is important because Wikimedians are at the coalface of the open knowledge ecosystem and have keen insights into the role of machine learning and automation in public information systems. Wikimedians have had conversations about AI and its impact for at least the past 15 years, and the Wikimedia Foundation has historically embraced automated technologies. Machine learning and AI tools have been used to detect potential vandalism, assess article quality and importance, translate content and classify images. But Wikimedia volunteers generally deploy these tools critically. The use of bots and other automated tools is debated and regulated according to community governance procedures so that their applications are limited and human judgement remains paramount. This occurs in the context of a long-standing collaborative approach to policy formation and content governance. Wikipedia’s core content policies, including verifiability, neutral point of view, and no original research, reflect a common vision of Wikipedia as an open encyclopedic project.

In 2018, the Wikimedia Foundation launched a cross-departmental program known as Knowledge Integrity. Although the program as a whole did not endure beyond 2019, several research strands continue to develop and inform Wikimedia Foundation strategy. The program recognises the critical part that Wikimedia projects play in the broader information ecosystem, with a goal to secure Wikimedia as the “hub of a federated, trusted knowledge ecosystem”.

With the rise of large language models and other generative AI systems, the importance of Wikimedia projects in online knowledge infrastructure has increased even further. Wikimedia projects are crucial sources of high-quality data for AI training or retrieval-augmented generation and as a source of groundtruth for testing. The implications of generative AI for Wikimedia projects thus extend beyond opportunities to enhance editing workflows or concerns about how to mitigate threats to the integrity of Wikimedia content, to the role that Wikimedia projects play in the broader information ecosystem. But a perception of opportunity is tempered for many by a fear that generative AI, particularly its incorporation into search engines, may undermine Wikipedia’s centrality and even threaten the open knowledge ecosystem itself.

Understanding how Wikimedians are thinking about potential opportunities and risks of generative AI for is therefore vital for informing both community policy and WMF strategy.

Methods

edit

In order to understand the implications of ChatGPT and other generative AI tools for Wikipedia and the broader information environment, we first looked to practitioners’ own understandings of the issue. We analysed on-wiki discussions about the implications of ChatGPT and generative AI tools for Wikipedia and asked a group of Wikimedians questions about their implications for knowledge integrity in a series of interviews and focus groups.

Following on-wiki data collection, we engaged in an inductive and emergent qualitative analysis of the data to identify critical themes of the discussion. We then verified the applicability of the emergent coding through the interview content and used the interviews to flag any novel themes that were distinct from those arising from Wikipedia’s online discussions. The interview cohort included many active members of the on-wiki discussions but we also sought to increase the representativeness of the data by identifying potential subjects amongst the culturally and linguistically diverse attendees at Wikimania 2023.

The research design partially mirrors the methodology utilised by Graham and Wright (2015), whereby content analysis approaches were paired with stakeholder interviews to surface new insights. However, each medium required a distinct approach to collecting and analysing the data.

Phase 1 data collection: online discussions and texts

edit

The text-based data for this analysis came from blogs, discussion threads, and internal debates between Wikimedia editors in public online forums such as the “Village Pump”, as well as the Wikimedia online periodical, the “Signpost” and the Wikimedia blog, “Diff”. These spaces were chosen as they were indicated as primary sites for discussing the challenges, practices, and policies by the Wikimedia community or because they arose as part of search processes or via reference in other included content.

Many of the conversations took place on English Wikipedia's Village Pump pages, such as the Village Pump (policy), Village Pump (idea lab), and Village Pump (proposals) pages. These are central discussion forums for the English Wikipedia community. While there were a few mentions of the potential impact of AI on non-English Wikipedias (such as the possibility of using AI-assisted translation to improve the quality of content across languages) the primary focus of the discussions was on the English Wikipedia.

Data was collected in 2023 between 13 July and 25 July. Over 160,000 words of content were coded across 88 articles, blogs, and discussions [see Table 1]:

Data sources for phase 1 analysis
Data Source Description Scope of dataset
Village Pump Discussion forums for proposal and debate of policies, initiatives, and other proposals 56,105 words,

126+* editors,

16 topic pages

Wikimedia-l Publicly archived and moderated list for discussion by the Wikimedia community and allied organisations supporting its work 22,558 words,

53 respondents,

12 discussion threads

Wikimedia ‘Diff’ blog A Wikimedia-community-focused blog established by the communications department at the Wikimedia Foundation (mostly in English but also in others) 7,334 words

14 contributors

14 blog posts and discussions

Wikimedia Signpost An online newspaper for and produced by the Wikimedia editor community 29,920 words

73+* editors

24 feature articles and discussions

English Wikipedia administrative, policy and working pages Discussions about LLMs across policy, page patrolling, criteria for deletion etc. pages on English Wikipedia 47,941 words

140+ contributors

22 pages

* Excluding anonymous authors listed by IP address.

Interviews

edit

Interviewees were selected from two sources: online discussions, and attendees at the Wikimania 2023 event in Singapore. Participants in the online discussions were contacted based on their activity and express interest in the subject. Potential participants were ranked by the frequency of their contributions to discussions, exhibited knowledge and/or interest in generative AI, and discussion of Wikipedia policies, practices, and tools. Highest-ranked candidates were contacted first until 10 respondents from online sources agreed to be interviewed. Some of these participants were interviewed at Wikimania 2023 in addition to others.

In total, 16 participants were interviewed, including 6 Wikimania attendees in 2 focus groups, and 10 individual interviews with 7 online forum participants and 3 researchers at the Wikimedia Foundation. Interview participants included 5 participants who identified as women and 10 whose first language was not English. Two of the participants were employed by the Wikimedia Foundation, one was a board member, and several at the time held official positions in Wikimedia chapters and projects.

The research interviews had 2 distinct focuses, based on the interviewees:

  • Wikimedians: starting with two in-person focus groups at Wikimania Singapore and interviews with individuals identified in Wikimedia-l conversations, followed by individual interviews. The goal was to understand to what extent LLMs are already having an impact on Wikipedia practice, which areas of practice might be most affected, and whether there are other risks not already identified that would be useful to consider. We focused on community members who have direct experience working in areas most likely to be affected or related to LLMs (e.g. in new page patrol, bot policy etc).
  • Wikimedia Research Team members: particularly those connected to the Knowledge Integrity program. The goal with this group was to understand how knowledge integrity relates in practice to questions of verifiability and provenance and to garner ideas about what is possible in terms of governing LLMs (given previous practice in relation to governing other automated processes and tools).

Coding

edit

The data from the online discussions and from the interviews were then coded, but each data set required a different approach. The online discussions provided a large and broad base of data that presented a wide range of themes. The challenge was to consolidate the large amount of content into insights that could provide some answers to the research questions. Conversely, the interviews provided more direct answers to these questions but consequently less opportunities for emergent analysis that could identify novel and unforeseeable problems and solutions. Drawing these sources of data together was a key task for answering the research questions.

Policy, Ethics and Human Subjects Research

edit

This research focuses on achieving data saturation through non-intrusive and non-disruptive methods. While it uses emergent approaches, all contact was be preceded by desk-based research through Wikimedia texts as well as scholarly research and reports. Interviews and focus groups were conducted in discussion-focused environments like Wikimania and discussed topics of direct relevance to participants' daily practices. All participation was voluntary and candidates were able to decide whether, and how much, to contribute, as well as whether and how they wish to be identified. The research adhered to rigorous academic ethical standards monitored by the ethical review processes of the University of Technology Sydney.

Results

edit

In interviews and focus groups, editors discussed benefits of generative AI tools relating to opportunities for improving productivity and expanding the scope of content across languages and in different formats. When asked about challenges, editors talked about risks both internal and external to Wikipedia. Internally, editors were concerned about the proliferation of AI-generated misinformation or biased content because of the difficulty in fact-checking and verifying AI outputs. Some worried that AI-generated content could overwhelm human-in-the-loop curation processes because of the challenges in detecting and distinguishing AI text. There was also a concern about the uneven distribution of risk where smaller language versions face higher risks because generative AI tools are not available for all languages and because small editor numbers could tip the balance away from human control over curation, especially among non-English speakers. Externally, editors were concerned about the (unlicensed) commercial exploitation of open content and editor labor by AI companies. They were also concerned about the long-term sustainability of Wikipedia with the potential for people turn increasingly to AI for their information needs. This also raised concerns about verifiability and other core principles and practices that underpin Wikipedia’s epistemic value, because when the link between a claim and its source is broken, readers are no longer able to check the veracity of a claim or participate in the process of knowledge curation. And given Wikipedia’s increasingly critical role in the information ecosystem, including as a source of high-quality data for AI training and augmented retrieval, some editors were concerned about the potential for information pollution on Wikipedia to lead to degradation of the broader information environment, or even model collapse.

For many of the editors we spoke to, internal risks can likely be managed via existing policy and practices, while others urged the need for AI-specific policy. This disagreement was reflected in debate over a proposed English Wikipedia LLM policy, which failed to reach consensus. Many editors saw value in technical measures to mitigate risks, e.g. detecting and marking AI-generated content, and use of AI plugins. Others suggested that education about generative AI are needed for new editors to learn how to use tools effectively. Many of those concerned about the increased risk in non-English Wikipedias thought there would be little incentive for AI companies to develop useful models for smaller languages. Many expressed similar concerns over problems of algorithmic bias and hallucination.

A summary of the main themes uncovered in discussions with the Wikimedia community can be found below

1. Perceived opportunities of generative AI

edit

1.1 AI-Assisted Content Creation

edit
  • Drafting article stubs or outlines for human editors
  • Suggesting missing topics or information gaps
  • Aiding with knowledge base queries and information synthesis
  • Improving language, correcting linguistic errors and formatting
  • Supporting editors’ writing in non-native languages
  • Creating new formats, illustration

1.2 Content Enhancement and Optimization

edit
  • Improving language translation between Wikipedia versions
  • Suggesting relevant internal links and connections
  • Automating multimedia classification and captioning

1.3 Editor Workflow Augmentation

edit
  • Prioritizing articles for improvement based on quality scores
  • Flagging potential vandalism or low-quality edits
  • Assisting with referencing and citation recommendations
  • Assisting with consensus evaluation
  • Writing code to compare and analyse articles

2. Perceived challenges of generative AI

edit
edit
  • Training data sources and potential copyright infringement
  • Commercial exploitation of openly licensed content
  • Lack of attribution for AI-generated text

2.2 Reliability and verifiability issues

edit
  • Potential for proliferation of AI-generated misinformation or biased content, particularly on smaller language versions
  • Difficulty in fact-checking and verifying AI outputs
  • Lack of transparency in AI language model training
  • Violation of other content policies, e.g. puffery

2.3 Risks to editorial practices

edit
  • Concerns about AI-generated content bypassing human curation
  • Many see a risk in the potential for misuse of AI by malicious actors
  • Challenges in detecting and distinguishing AI-written text
  • Risks of over-reliance on AI at the expense of human knowledge
  • Challenges to core policies and concepts including authorship

2.4 Uneven distribution of risk

edit
  • Lack of contributors is a risk for smaller language versions in both editorial resourcing and policy development

2.5 Threats to the sustainability of Wikipedia and to the knowledge and information ecosystem

edit
  • Wikipedia plays a key role in the information ecosystem
  • Wikipedia may become unsustainable as people turn to AI for their information needs
  • Potential for degradation of information ecosystem and AI model collapse
  • AI companies’ lack of transparency and accountability undermines open knowledge

3. What could be done to address risks and embrace opportunities?

edit

3.1 Safeguards for AI Integration in Wikipedia

edit
  • For many, internal risks can likely be managed via existing policy and practices, while others urge the need for AI-specific policy
  • Most see value in technical measures to mitigate risks, e.g. detect and mark AI-generated content, AI plugins
  • The need to preserve human editorial oversight and curation processes is paramount
  • Transparency is important for a wide range of AI use
  • Education and support on genAI are needed for new editors

3.2     Addressing external ecosystem and market risks

edit
  • Transparency and accountability for AI companies are critical for managing knowledge integrity in the broader information ecosystem and existential threats to Wikipedia
  • Responsibility needs to be distributed amongst many actors and stakeholders
  • The Wikimedia Foundation and Wikimedia community need to be pro-active in advocating for Wikimedia and open knowledge, including in copyright and licensing.

Discussion

edit

When we take a narrow, on-platform view of Wikipedia, our results suggest that the threats posed by generative AI diminish to a level manageable mostly through existing editorial and curation practices. But taking a broader view of Wikipedia's role in the production of public knowledge, risks emerge that may be more difficult to manage through internal Wikimedia practices or policy.

These risks arise in three areas linked to the integrity of Wikipedia as a source of open knowledge: sustainability, equity in knowledge production, and verifiability. First, enhanced affordances for editors to add automatically generated content to Wikipedia will undoubtedly increase the burden of maintenance, which may become unsustainable, particularly for smaller language Wikipedias. Also in relation to sustainability, the wholesale extraction of Wikimedia content for AI training without acknowledgement of the open licensing conditions reflects a larger risk to the ongoing sustainability of open knowledge. Second, the growth in generative AI tools functional solely in large languages like English may worsen the existing disparity between different language versions. These risks are well understood in the community and have been detailed in recent research.[5] Third – and implicit, though less visible, in our data – are the inevitable challenges to Wikipedia’s principle of verifiability as its data is increasingly extracted and abstracted by third-party tools, and where follow-on users are unable easily to discover the source of factual claims or participate in their correction.

The implications of generative AI may be significant for Wikipedia, but their significance for the health, vitality and diversity of our public sphere are even more critical. The key elements of knowledge integrity such as verifiability are useful principles for establishing trustworthy information more broadly in a digitally mediated information environment. On Wikipedia, knowledge integrity is enabled through a suite of policies, standards and practices. Verifiability supports a reader’s ability to check the veracity of a claim on Wikipedia and their ability to participate in the process of correcting inaccurate claims i.e. becoming active in the curation of public knowledge claims. On Wikipedia, the principle of verifiability underpins both the epistemic value of Wikipedia itself and of data that is extracted for use in other systems. This enables a system of public verification and situates Wikipedia in a broader epistemic ecosystem. But Wikipedia's increasingly important role as a provider of ‘knowledge as a service’ threatens to undermine the integrity of this epistemic ecosystem – apparent in unreliable or non-existent attribution and in obstructed pathways to participation in the epistemic commons.

The use of the term “integrity” is on the rise as antidote to online speech threats. United Nations Global Principles for Information Integrity (2024), for example, articulates that “Promoting information integrity involves empowering people to exercise their right to seek, receive and impart information and ideas of all kinds and to hold opinions without interference.” Despite the use of the term, there is little concrete engagement with how it could be achieved.

Wikimedians recognise that knowledge integrity is a feature of a trustworthy information system but that trustworthiness is not some abstract quality determined from on high. Instead, it is a feature dependent on a system of people, practices, material affordances and rules that enable the practice of verifiability. The importance of knowledge integrity and verifiability become particularly important when claims are made about people, raising questions of epistemic justice (Frost-Arnold 2021; Watson 2021).

As the capabilities of genAI systems rapidly evolve, substantive debates are unfolding across Wikimedia's communities about striking a balance that embraces the innovative assistive benefits of genAI while upholding core principles around informational reliability, neutrality and human oversight. Ultimately, how Wikipedia navigates this technological disruption will likely have far-reaching impacts on other sources of public knowledge and the integrity of our information environment worldwide.

Resources

edit

Grants:Programs/Wikimedia Research Fund/Implications of ChatGPT for knowledge integrity on Wikipedia

References

edit
  1. a b Harrison, S. (January 12, 2023). “Should ChatGPT Be Used to Write Wikipedia Articles?”. Slate. https://slate.com/technology/2023/01/chatgpt-wikipedia-articles.html
  2. a b Wikimedia contributors (2023a). Community Call Notes. Accessed 29 March, 2023. https://meta.wikimedia.org/w/index.php?title=Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends/Community_call_notes&oldid=24785109
  3. Zia, Leila; Johnson, Isaac; M, B.; Morgan, Jonathan; Redi, Miriam; Saez-Trumper, Diego; Taraborelli, Dario (2019-02-14). "Knowledge Integrity - Wikimedia Research 2030". doi:10.6084/m9.figshare.7704626.v2. 
  4. Wikimedia contributors (2023b). Movement Strategy. Accessed 29 March, 2023. https://meta.wikimedia.org/w/index.php?title=Movement_Strategy&oldid=24329161
  5. Vetter, Matthew A.; Jiang, Jialei; McDowell, Zachary J. (2025-02-19). "An endangered species: how LLMs threaten Wikipedia’s sustainability". AI & SOCIETY. ISSN 1435-5655. doi:10.1007/s00146-025-02199-9.