Research:Monthly wikimedia editor activity dataset
The page documents a dataset that describes the activity levels of registered editors across Wikimedia's projects. This dataset contains a row for every (wiki, user, month) that contains a count of all 'revisions' saved and a count of those revisions that were 'archived' when the page was deleted.
Halfaker, Aaron (2015): Wikimedia editor activity (monthly). figshare.
dx.doi.org/10.6084/m9.figshare.1553296
Methods
editThe Wikimedia internal database replicas were queried. Two queries were deployed against all wikis and then joined. The last complete month in the dataset is June, 2015.
Editor months
|
---|
SELECT wiki, month, user_id, user_name, user_registration, SUM(revisions * archived) AS archived, SUM(revisions) AS revisions FROM ( SELECT LEFT(rev_timestamp, 6) AS month, DATABASE() AS wiki, rev_user AS user_id, FALSE AS archived, COUNT(*) AS revisions FROM revision WHERE rev_timestamp >= "201503" GROUP BY LEFT(rev_timestamp, 6), rev_user UNION ALL SELECT LEFT(ar_timestamp, 6) AS month, DATABASE() AS wiki, ar_user AS user_id, TRUE AS archived, COUNT(*) AS revisions FROM archive WHERE ar_timestamp >= "201503" GROUP BY LEFT(ar_timestamp, 6), ar_user ) AS editor_months INNER JOIN user USING (user_id) GROUP BY wiki, month, user_id ORDER BY wiki, month; |
Local user info (attached_method)
|
---|
SELECT DATABASE() AS wiki, user_id AS user_id, user_registration AS user_registration, gu_id AS globaluser_id, lu_attached_timestamp AS user_attached, lu_attached_method AS attached_method FROM user LEFT JOIN centralauth.localuser ON lu_wiki = DATABASE() AND lu_name = user_name LEFT JOIN centralauth.globaluser ON gu_name = lu_name GROUP BY user_id; |
Sample
edit- wiki -- The dbname of the wiki in question ("enwiki" == English Wikipedia, "commonswiki" == Commons)
- month -- YYYYMM
- user_id -- The user's identifier in the local wiki
- user_name -- The user name in the local wiki (from the 'user' table)
- user_registration -- The recorded registration date for the user in the 'user' table
- archived -- The count of deleted revisions saved in this month by this user
- revisions -- The count of all revisions saved in this month by this user (archived or not)
- attached_method -- The method by which this user attached this account to their global account
wiki | month | user_id | user_name | user_registration | archived | revisions | attached_method |
---|---|---|---|---|---|---|---|
aawiki | 200404 | 3 | Angela | NULL | 2 | 2 | password |
aawiki | 200406 | 2 | Tim Starling | NULL | 10 | 10 | password |
aawiki | 200408 | 2 | Tim Starling | NULL | 2 | 2 | password |
aawiki | 200409 | 14 | Davidcannon | NULL | 2 | 2 | primary |
aawiki | 200412 | 2 | Tim Starling | NULL | 12 | 12 | password |
aawiki | 200501 | 21 | Rich Farmbrough | NULL | 16 | 16 | password |
aawiki | 200502 | 23 | ` | NULL | 2 | 2 | primary |
aawiki | 200508 | 62 | Afar god | NULL | 6 | 10 | primary |
... | |||||||
enwiki | 200101 | 9161929 | KlausSeistrup | 20090308075444 | 0 | 1 | new |
enwiki | 200101 | 9167505 | ALittleLuck | 20090309011538 | 0 | 1 | new |
enwiki | 200101 | 10164500 | IvoryRing | 20090722031343 | 0 | 1 | new |
enwiki | 200101 | 10164531 | StasK | 20090722031938 | 1 | 2 | primary |
enwiki | 200101 | 10164597 | ChessyPig | 20090722033150 | 0 | 1 | new |
enwiki | 200101 | 11327310 | Dhcp058.246.lvcm.com | 20091230114723 | 2 | 3 | new |
Note that some users who registered their accounts before 2006 may have strange 'user_registration' dates because that field was not tracked in a reasonable way until 2006. See bugzilla:22097.
Summary analysis
editComing soon