Future Audiences/Experiments: conversational/generative AI


Future Audiences Objective 2 Key Result 2: Test a hypothesis around conversational AI knowledge seeking, to explore how people can discover and engage with content from Wikimedia projects.

Rationale

edit

Large language models (LLMs) and tools/applications built on them – e.g., ChatGPT, Google Bard, Microsoft CoPilot – present both opportunities and risks to our movement. AI/ML technologies have been a part of our movement for over a decade and have assisted human contributors to our projects with, e.g., content translation, vandalism patrol, and structured newcomer tasks. With the latest generation of AI/ML tools, there may be more opportunities to make searching, consuming, and/or contributing to Wikimedia projects easier, more intuitive, and more accessible for more people.

On the other hand, AI assistants may pose a serious risk to the sustainability of our movement if they become primary entry-points for knowledge-seeking if their output does not provide attribution for or pathways to contributing to Wikimedia projects or communities. The may also generate large quantities of low-quality content that could overburden our moderation systems.

There is a high degree of legal, technical, and social complexity to the use of AI within the Wikimedia context, and for this year we are committed to gathering more data and insights to inform how to think about future strategic investments in this space.

Experiments

edit
 
Report of the ChatGPT experiment

Wikipedia ChatGPT plugin

edit
Status:    Concluded

Hypothesis: If we create a Wikipedia ChatGPT plugin that summarizes content from Wikipedia and attributes/links to our projects, we can better understand how users want to interact with knowledge via AI assistants, and how/whether our content might improve their experience.

[Status: as of 2 February 2023, experiment concluded. Explanation and details]

Success criteria and results: details
 
Screenshot of MVP Wikipedia ChatGPT plugin response
Priority Assumption Data/metrics Success condition
MVP
P0 People using AI assistants will want to receive knowledge from Wikipedia for some of their queries
  • # of plugin queries/ day
  • # of query sessions
  • ~1000s of queries/day
  • Queries, users, and or queries per user per day increase over time (indicating sustained usage/usefulness)
P0 Knowledge arrives with fidelity to end-users
  • Relevance (does ChatGPT find and summarize back relevant Wikipedia content?)
  • Accuracy (does ChatGPT correctly summarize knowledge from Wikipedia?)
  • Attribution (does ChatGPT follow our instructions for attributing and linking to Wikipedia?
  • Qualitative assessment of user queries & results
P0 People want to get knowledge in non-English languages from Wikipedia via an AI assistant
  • All above broken out by languages tested

Preliminary results

edit
Priority Assumption Results Conclusion
MVP
P0 People using AI assistants will want to receive knowledge from Wikipedia for some of their queries
  • ~500-1000 queries per day in first month since launch
  • Queries and queries per user per day trending up week over week
Plugin has modest adoption but seems to be providing value to users who have enabled it
P0 Knowledge arrives with fidelity to end-users
  • Relevance: 84%
  • Accuracy: 85-89% (based on different quality coders' results)
  • Attribution: 68%
Overall, relevance and accuracy is fairly high (on par with Wikipedia according to some reliability studies), but attribution is inconsistent and needs to be looked at further.
P0 People want to get knowledge in non-English languages from Wikipedia via an AI assistant
  • English generally higher on accuracy and attribution than other languages analyzed (German, English, French, Japanese, Russian)
  • More significant issues in accuracy and attribution noted in some non-English languages analyzed (e.g., no attribution in Russian and much lower accuracy rates – 30-70% – in Russian and German).
Understanding why quality differs in different languages is an important next step.

"Citation Needed" browser extension

edit
Status:    In progress

Hypothesis: If we leverage Wikipedia’s reputation for independence and reliability, making it available across the web, people will use it to verify claims on the internet.

Minimum Viable Product

edit

As a user of Google Chrome, I can install a browser extension that allows me to:

  • Select passages of text I come across on the web
  • Receive back information about whether the claim(s) in this text match any relevant content on Wikipedia

Key research questions

edit
  • Do people on the internet want Wikipedia content when not on our website?
  • Do people trust Wikipedia content and brand as a reliable source of information?
  • Can we reach new audiences? Or create new opportunities for current audiences to use Wikipedia?

"Add A Fact"

edit

Hypothesis: If we make it easy for off-platform readers to add claims/facts from third-party websites, their contributions can help sustain and grow content in a potential future where most people consume Wikimedia content off-platform.

Key research questions

edit
  • Do people on the internet want to contribute good-faith information to Wikipedia?
  • Who are the people who would be interested in doing this? i.e.:
    • The general public
    • People who are Wikipedian-like in some way – e.g., Reddit moderators, subgroups on the Internet (i.e., fandoms, communities, fact-checkers, etc.)?
    • Existing Wikipedians
  • How might we deliver these contributions into existing or new pipelines for human review/oversight/addition to Wikipedia?

Other ideas

edit

If you have more ideas, please leave them on the talk page!

References

edit