Web2Cit: collaborative automatic citations for web sources

Research group

  • Nidia Hernández is a Web2Cit's Research Group Member (script developer and writing)
  • Romina De León is a Web2Cit's Research Group Member (technical staff and editor)
  • Gimena del Rio Riande is a Web2Cit's Research Group Member (coordination and writing)



The main goal of the Research group is to determine and analyze the coverage gap of automatic citations in Wikipedia through a script that will allow us to calculate the impact and accuracy of Web2Cit.



References are one of the main pillars on which Wikipedia is collaboratively constructed. To aid Wikipedia editors with inserting and formatting references, Wikipedia’s visual editor provides an automatic citation generator that produces a formatted citation given a URL, DOI, or other identifier of the cited source. However, this automatic tool does not always succeed in extracting citation metadata from web sources, mostly because these sources fail to appropriately embed these metadata. Until now, the only ways to fix this problem demanded either time or programming skills, ranging from manually fixing the errors to changing the underlying software code. Web2Cit is a tool that promises to lower the barriers to participation, by providing a relatively simple way to collaboratively define extraction procedures. But what is the actual performance of the current automatic tool and how good is Web2Cit in doing what it promises? In this research project we extracted citations from featured articles in the Spanish (SP), English (EN), French (FR), and Portuguese (PT) Wikipedias and compared them against automatically generated citations to estimate this performance. We found that the automatic generator returned the expected results on average for 60% of citation fields. In addition, we made available a script that will let us repeat this analysis in the future, once the Web2Cit tool has been adopted by the Wikipedia community and users.

If you want to continue reading the project report click on: Web2Cit Research Group. Final Report.

Automatic Web2Cit tests


In addition, we developed a script that automatically creates Web2Cit translation tests from the citation metadata that we extracted from Wikipedia featured articles.

These automatically created translation tests may help Web2Cit contributors identify websites having problems with Citoid and fix them with Web2Cit. So far this script was used to generate translation tests for 36 highly-cited low-performing website domains. The list of automatically created tests is available here.

These tests have been uploaded to the Web2Cit storage, from where they are already available to Web2Cit's server and monitor.

Deliverables of the Research group


Other Resources
