User:Neil Shah-Quinn (WMF)/Data portal draft
There is a great deal of publicly-available, open-licensed data about Wikimedia projects. This page is intended to help community members, developers, and researchers who are interested in analyzing raw data learn what data and infrastructure is available. If you have any questions, you might find the answer in the Frequently Asked Questions about Data.
If you wish to browse pre-computed metrics and dashboards, see statistics.
If this publicly available data isn't sufficient, you can look at the page on private data access to see what non-public data exists and how you can gain access.
If you wish to donate or document any additional data sources, you can use the Wikimedia organization on DataHub.
Quick glance
editData Dumps (details)Dumps of all WMF projects for backup, offline use, research, etc.
|
API (details)The API provides direct, high-level access to the data contained in MediaWiki databases through HTTP requests to the web service.
|
Toolforge (details)Toolforge allows you to connect to shared server resources and query a copy of the database (with some lag).
|
Recent changes stream (details)Wikimedia broadcasts every change to every Wikimedia wiki using the Socket.IO protocol. |
Analytics Dumps (details)Raw pageview, unique device estimates, mediacounts, etc.
|
WikiStats (details)Reports in 25+ languages based on data dumps and server log files.
|
DBpedia (details)DBpedia extracts structured data from Wikipedia, allows users to run complex queries and link Wikipedia data to other data sets.
|
DataHub and Figshare (details)A collection of various Wikimedia-related datasets.
|
Readership data
edit- the pageviews API
- unique devices API and dumps
Editing metadata
editEditing metadata includes information about the users, time, and revision comment, and so on, but does not include the content of the revision itself.
This data is available from:
- the action API
- the XML data dumps
- the replicas of the MediaWiki databases available on Wikimedia's toolforge
- Recent changes stream
Raw content data
editData that includes the raw content of page revisions is available from:
- the action API
- the XML data dumps
- the REST API
Structured content data
edit- Wikidata Query Service
- DBPedia
Miscellaneous data
editAnalysis infrastructure
editIn addition to the raw data described above, there is a great deal of helpful infrastructure for research and analysis provided for people contributing to Wikimedia's mission.