NLP for Wikipedia (ACL 2025)/Call for Papers

Home

Call for Papers

Program

2nd WikiNLP: Advancing Natural Language Process for Wikipedia

Co-located with ACL 2025

Important dates

Call for participation release: 31 January 2025
Paper submission deadline: ~~23 April 2025~~ → 30 April 2025
Accept/reject notification: 30 May 2025
Camera-ready paper deadline: 23 June 2025
Workshop: 1 August 2025

All deadlines are midnight anywhere on earth (AOE).

Submission link

Please follow the submission guidelines below and submit your work via OpenReview.

Submission guidelines

This year, the WikiNLP workshop invites contributions across three tracks:

Wikimedian provocations
Datasets
Ongoing or published work on NLP for Wikimedia

Detailed descriptions for each track can be found below.

If you have questions about potential research ideas or existing resources in a given topical area, please do not hesitate to reach out to the workshop organizers and we will do our best to help out.

Track 1: Wikimedian provocations (non-archival)

This track provides a platform for Wikimedians to share their needs for NLP tools, features, or policies that can support the Wikimedia community and advance its mission. We invite participants involved in improving and maintaining Wikimedia projects (whether as editors, patrollers, stewards, or any Wikimedians) to share ideas on how NLP could support your work. Submissions to this track aim to offer NLP researchers guidance on the actual needs of Wikimedians.

Submissions to this track may cover but are not limited to the following topics:

What NLP tools are essential or can be extremely helpful in improving or maintaining Wikimedia projects?
What NLP-related features are missing from the user interface of Wikimedia but are essential for Wikimedia projects?
What policies might help govern the use of NLP tools?
What practices in NLP research could be beneficial or harmful to the Wikimedia community?
What NLP-related datasets or resources could help assess the current state of Wikimedia projects?

Submissions must be written in English on Meta Wiki and follow the template provided here. Links to your user page(s) on Wikimedia, as well as relevant Wikimedia essays, guidelines, policies, and discussions, are also welcome.

Submissions are reviewed by the workshop organizers based on their feasibility and relevance to NLP. Submissions to this track are non-archival. Accepted submissions will be featured on the website and discussed during the workshop, but will not be included in the official proceeding. Authors of accepted submissions are welcome to participate in the workshop but not required.

Track 2: Datasets (archival)

This track provides a venue for NLP researchers and practitioners to share datasets that could be useful for the Wikimedia projects. Specifically, we have identified a list of high-priority areas that could help improve or maintain Wikipedia. We invite participants to review these pre-identified NLP tasks and curate a dataset that can be used to train or evaluate NLP models for accomplishing these tasks. We have put a particular focus on datasets related to Wikipedia policies and feature requests made by Wikimedians themselves. Participants are also welcome to propose additional NLP tasks and datasets that are not included in the list above but please clearly detail the expected value to the Wikimedia community.

To ensure that the submitted datasets are truly useful to the Wikimedia community, the workshop organizers are holding regular office hours where we can provide feedback to people interested in this track on their submissions. The dataset guidance also points to various resources and past work that can be used as inspiration.

The submission should include: (1) a detailed description of the dataset (e.g., following the best practices outlined in Datasheets for Datasets), (2) a demonstration of the dataset's use (e.g., the performance of a model trained with the dataset or useful evaluation based on the dataset), and (3) a link to the dataset, which should be publicly accessible on GitHub, HuggingFace, or other sites. Submissions must be written in English, following ACL's template and guidelines for long papers. They should be no longer than eight pages and submitted in PDF format through OpenReview. A paper checklist will be required to be appended as well.

Submissions are peer-reviewed by three program committee members and will be double-blind. Submissions to this track are archival, meaning accepted submissions will be included in the official proceedings. Authors of accepted submissions will be invited to present lightning talks and/or poster sessions during the workshop.

Track 3: Ongoing or published work on NLP for Wikimedia (non-archival)

This track invites submissions at the intersection of Wikimedia projects and NLP (broadly construed). Submissions can include already published work, research in preparation for publication, or ongoing studies. All submissions are non-archival and will be reviewed by the workshop organizers for topical relevance and correctness.

For this track, submissions can be made in any template or format but should not exceed 20 pages (including references, figures, and tables). Submissions must be in English and uploaded as a PDF file to receive full consideration. Accepted submissions will not be included in the workshop proceedings. However, accepted articles will be listed on the workshop website and may be featured during the event.

Papers have to be submitted through OpenReview. Please add [PUBLISHED] at the beginning of the title on the submission page so we know that you are submitting to this track. A paper checklist will be required to be appended as well.

Paper checklist

Here are the questions we ask participants to address in their papers for Tracks 2 and 3. Please include your response to the checklist in your paper. It will not be counted toward the page limit.

Benefits:

How does this work support the Wikimedia community?
What license are you using for your data, code, models? Are they available for community re-use?
Did you provide clear descriptions and rationale for any filtering that you applied to your data? For example, did you filter to just one language (e.g., English Wikipedia) or many? Did you filter to any specific geographies or topics?

Risks:

If there are risks from your work, do any of them apply specifically to Wikimedia editors or the projects?
Did you name any Wikimedia editors (including username) or provide information exposing an editor's identity?
Could your research be used to infer sensitive data about individual editors? If so, please explain further.

Here are some related readings that offer context on these questions: