Research:Blocks on the English-language Wikipedia

This page documents a completed research project.


Key Personnel

edit

Project Summary

edit

A vast amount of research has taken place exploring how Wikipedia defends itself against ill-intentioned users. This has mostly explored vandalism - bad faith edits made by those users - and the most direct consequence of those edits (reversion). The objective of this research project is to explore the ultimate consequence of poor intentions, namely blocking. After investigating trends in the overall block rate, we take the block logs, from 2006 to the present, and use a series of regular expressions to categorise blocks into one of six categories:

  1. Spam: blocks for using Wikipedia for advertising purposes;
  2. Disruption: blocks for BLP violations, defamation, personal attacks, threats (legal or otherwise), copyright violations, edit warring and POV-pushing, broadly construed;
  3. Sockpuppetry: the use of multiple accounts in violation of Wikipedia's policies, or long-term, multiple-account abuse of Wikipedia;
  4. Username blocks: blocks for violating Wikipedia's username policies;
  5. Proxy usage: the blocking of proxies.
  6. Misc: blocks for reasons not identified by the regular expressions.

The resulting data is then examined and compared with potential confounds with the block rate (such as registration rates or AbuseFilter hits) in an attempt to answer three core research questions:

  1. Has there been any noticeable shift in the types of actions that users are blocked for?
  2. Has there been any noticeable shift overall?
  3. If either is true: why?

Results

edit

Shifts in the rate and type of user blocks

edit

The first task is to investigate whether there have been any shifts in the rate and type of user blocks. With the knowledge that the actions of one group inevitably impacts the other, the dataset was split into two groups prior to analysis - one consisting of blocks of anonymous users, and one consisting of blocks of registered users. In both cases, data was gathered primarily from the logging table, and consists of all block actions between January 2006 and September 2013, excluding unblocks and the modification of existing blocks.

Overall shifts

edit

Proportionate shifts

edit

Exploring declines

edit

Sudden decline (2008-2009)

edit

Constant decline (2009-2013)

edit

References

edit
edit

Conclusion

edit