WSoR datasets/reverted

The reverted dataset contains information about reverted revisions, who they were reverted by, which revision was reverted back to and whether the revert was for vandalism.

Location

edit

db42:halfak.reverted_20110115

Fields

edit
halfak@internproxy:~/Sandbox/wsor$ mysql -h db42 -e "EXPLAIN reverted_20110115;SELECT * FROM reverted_20110115 LIMIT 3" halfak
+----------------+----------------+------+-----+---------+-------+
| Field          | Type           | Null | Key | Default | Extra |
+----------------+----------------+------+-----+---------+-------+
| rev_id         | int(11)        | YES  | MUL | NULL    |       |
| username       | varbinary(255) | YES  |     | NULL    |       |
| user_id        | int(11)        | YES  |     | NULL    |       |
| comment        | varbinary(255) | YES  |     | NULL    |       |
| rvtg_id        | int(11)        | YES  | MUL | NULL    |       |
| rvtg_username  | varbinary(255) | YES  |     | NULL    |       |
| rvtg_user_id   | int(11)        | YES  |     | NULL    |       |
| rvtg_comment   | varbinary(255) | YES  |     | NULL    |       |
| rvtto_id       | int(11)        | YES  |     | NULL    |       |
| rvtto_username | varbinary(255) | YES  |     | NULL    |       |
| rvtto_user_id  | int(11)        | YES  |     | NULL    |       |
| rvtto_comment  | varbinary(255) | YES  |     | NULL    |       |
| is_vandalism   | tinyint(1)     | YES  | MUL | NULL    |       |
| revs_reverted  | int(11)        | YES  |     | NULL    |       |
+----------------+----------------+------+-----+---------+-------+
+-------------+----------------+---------+-----------------------------------------------------+-----------+---------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------+---------------+-----------------------------------------------------+--------------+---------------+
| rev_id      | username       | user_id | comment                                             | rvtg_id   | rvtg_username | rvtg_user_id | rvtg_comment                                                                                                                                     | rvtto_id  | rvtto_username | rvtto_user_id | rvtto_comment                                       | is_vandalism | revs_reverted |
+-------------+----------------+---------+-----------------------------------------------------+-----------+---------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------+---------------+-----------------------------------------------------+--------------+---------------+
|   261295963 | 88.109.188.145 |       0 |                                                     | 261309052 | Numyht        |      4660442 | [[WP:UNDO|Undid]] revision 261295963 by [[Special:Contributions/88.109.188.145|88.109.188.145]] ([[User talk:88.109.188.145|talk]]) rv unsourced | 260061081 | Red Sismey     |       1991048 | /* Acting */                                        |            0 |             1 |
|   269434849 | 75.53.209.45   |       0 | /* Total Nonstop Action Wrestling (2008-present) */ | 269442144 | JakeDHS07     |       743183 | /* Total Nonstop Action Wrestling (2008-present) */ week by week as well as incorrect                                                            | 269376850 | PCE            |       2202642 | /* Total Nonstop Action Wrestling (2008-present) */ |            0 |             1 |
|   280667690 | 86.44.89.61    |       0 |                                                     | 280684923 | TheFBH        |      5122755 | [[WP:UNDO|Undid]] revision 280667690 by [[Special:Contributions/86.44.89.61|86.44.89.61]] ([[User talk:86.44.89.61|talk]])                       | 279423089 | TheFBH         |       5122755 | /* In wrestling */                                  |            0 |             1 |
+-------------+----------------+---------+-----------------------------------------------------+-----------+---------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------+---------------+-----------------------------------------------------+--------------+---------------+

Each row represents a reverted revision.

  • Reverted Revision - One of the reverted revisions
    • rev_id: The identitifier of the reverted revision PRIMARY KEY
    • username: The username of the editor who created the reverted revision (same as revision.rev_user_text)
    • user_id: The identifier of the editor who created the reverted revision (same as revision.rev_user)
    • comment: The comment left by the reverted editor (same as revision.rev_comment)
  • Reverting Revision - The revision that performed the revert.
    • rvtg_id: The identitifier of the reverting revision
    • rvtg_username: The username of the editor who performed the revert (same as revision.rev_user_text)
    • rvtg_user_id: The identifier of the editor who performed the revert (same as revision.rev_user_text)
    • rvtg_comment: The comment left by the reverting editor
  • Reverted to Revision - The revision that was re-instated by the reverting edit.
    • rvtto_id: The identitifier of the revision that was reverted back to
    • rvtto_username: The username of the editor who created the reverted to revision
    • rvtto_user_id: The identifier of the editor who created the reverted to revision
    • rvtto_comment: The comment left by the editor who created the reverted to revision
  • Meta
    • is_vandalism: True when rvtg_comment matched D_LOOSE/D_STRICT regexp
    • revs_reverted: The total number of revisions that were reverted in this action.

Reproduction

edit

To reproduce this dataset, install Wikimedia Utilities and run this script.

For example:

$ python reverts.py --output_prefix=test1/ enwiki.*.7z
Aug-29 16:51:36 INFO     Starting test1/ run...
Aug-29 16:51:36 INFO     Creating output file: test1/revert.tsv
Aug-29 16:51:36 INFO     Creating output file: test1/reverted.tsv
Aug-29 16:51:36 INFO     Prcoessing...
Aug-29 16:51:36 INFO     Processing dump file enwiki-20110829-pages-meta-history.sample.xml.
|.||||..|.|....|......|........|..........|...........|.|.|.|.|...|.|||...|.|.|

This will produce two output files that can be loaded into the database test1/revert.tsv and test1/reverted.tsv. Create the table to load into:

CREATE TABLE reverted (
	rev_id        INT UNSIGNED,
	rvtg_id       INT UNSIGNED,
	rvtto_id      INT UNSIGNED,
	is_vandalism  BOOL,
	revs_reverted INT(2)
);

This gets the base dataset into the MySQL so it can be denormalized. To denormalize, run the following command:

CREATE TABLE reverted_denorm
SELECT
	rvtd.rev_id,
	rvtd.rev_user_text as username,
	rvtd.rev_user as user_id,
	rvtd.rev_comment as comment,
	rvtg.rev_id as rvtg_id,
	rvtg.rev_user_text as rvtg_username,
	rvtg.rev_user as rvtg_user_id,
	rvtg.rev_comment as rvtg_comment,
	rvtto.rev_id as rvtto_id,
	rvtto.rev_user_text as rvtto_username,
	rvtto.rev_user as rvtto_user_id,
	rvtto.rev_comment as rvtto_comment,
	reverted.is_vandalism,
	reverted.revs_reverted
FROM reverted
LEFT JOIN revision rvtd
	ON reverted.rev_id = rvtd.rev_id
LEFT JOIN revision rvtg
	ON reverted.rvtg_id = rvtg.rev_id
LEFT JOIN revision rvtto
	ON reverted.rvtto_id = rvtto.rev_id;

Finally, move the denormalized table over the original.

DROP TABLE reverted;
ALTER TABLE reverted_denorm RENAME TO reverted;

Notes

edit

The current dataset is complete up to the January 2011 dump.