Research:Top edited Portuguese Wikipedia articles in 2016
Based on similar researches regarding Wikipedia in other languages, this project aims to present a list of top edited pages in Portuguese Wikipedia.
Methods
editCurrently, all Wikipedia projects runs over MediaWiki software, maintened by Wikimedia Foundation. When every edit is made on this software, a new record is included in the revision table of the database describing all information of that edit (when it was done, what page, who made it, what changed etc). Knowing this, we measure the number of edits counting the number of records in revision table for each page, in the period of 2016. To achieve this, it is needed to access the Portuguese Wikipedia database. A backup of the database can downloaded in its full form from Wikimedia Downloads, or queries can be run online through Quarry.
This approach is similar to Top edited English Wikipedia articles in 2016, but since most queries should be run once, performance was not a concern here. So the result of this query represents exactly the main objective of this project.
Query executed in the database (SQL language)
|
---|
use ptwiki_p;
set @total_edit_rank = 0;
set @prev_edit_count = 0;
SELECT
t.rank
, REPLACE(t.page_title, '_', ' ') page_title
, t.total_edits
FROM
(
SELECT
t.*
, @total_edit_rank := @total_edit_rank + IF(@prev_edit_count=t.total_edits, 0, 1) AS rank
, @prev_edit_count := t.total_edits
FROM
(
SELECT
page_title
, COUNT(*) AS total_edits
FROM
revision
INNER JOIN page
ON (revision.rev_page = page.page_id)
WHERE
revision.rev_timestamp LIKE '2016%'
and page.page_namespace = 0
GROUP BY
page.page_title
) t
ORDER BY
t.total_edits DESC
, t.page_title ASC
) t
WHERE
(t.rank <= 100)
;
|
In this approach, we list the 100 most edited pages, in a way that multiple pages may belong to the same position in the rank if they have the same number of edits (the number of listed pages may vary, but it is assured to be equals or greather than 100). We also filter pages of the main namespace only, that is, only Article pages.[note 1]
Policy, Ethics and Human Subjects Research
editAlthough editors' information is publicly available, their names or IP are not the subject of this research project, so they can not be identified using the results published here. Their usernames or IP are stored in the queried table of the database, but it is not disclosed for any purpose in any part of the process (even to people involved in this project, or to people aiming to reproduce the results presented here).
Results
editThe table bellow presents the results extracted from the database. Nothing was changed, despite formatting it as a wikitable with wikilinks (big thanks to TablesGenerator.com).
Technical info and other considerations
editRun times and server info
|
---|
The query presented here was executed in Wikimedia Tool Labs servers, at Feb 17, 2017, 10:00 PM (UTC). Their run times are: real 1m26.288s user 0m0.003s sys 0m0.015s Other relevant information about client and server instances: $ uname -a Linux tools-bastion-03 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ mysql --version mysql Ver 15.1 Distrib 5.5.54-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2 $ ./sql enwiki 'SHOW VARIABLES LIKE "%version";' +------------------+-----------------+ / Variable_name / Value \ +------------------+-----------------+ / innodb_version / 5.6.21-70.0 \ / protocol_version / 10 \ / tokudb_version / tokudb-7.5.3 \ / version / 10.0.15-MariaDB \ +------------------+-----------------+ |
Notes
edit- ↑ Portuguese Wikipedia has several namespaces for various purposes (User pages, User talks, Discussion, Help pages etc). But articles are only in the main namespace, and there is nothing in the main namespace but articles.