Research:Spambot detection system to support stewards

Tracked in Phabricator:
Task T288338
Duration:  2021-July – ??
knowledge integrity, spam detection
This page documents a completed research project.


Stewards are the group of user with the highest levels of rights and permissions across projects in Wikimedia, including global blocks, global locks and global bans that are indispensable for combatting cross-wiki abuse. Despite their critical role in governance and content moderation, stewards' workflows barely rely on advanced tools. In order to increase the efficiency of stewards in spambot detection capacities, this research project will investigate the potential of computational approaches.

Introduction

edit

While much of the research on Wikipedia governance and content moderation has focused on local admins of specific Wikipedia projects, the workflows and challenges of stewards have remained understudied. Stewards are responsible for tasks as crucial as granting newly-appointed functionaries their permissions, or global locks/bans (accounts) and global blocks (IPs) that are indispensable for combatting cross-wiki abuse. Although both global locks and bans prevent accounts from editing, there are relevant differences between these two processes, shown in the following table.


Global lock Global ban
Type Technical Social
Target Account (though if an account is globally locked, other accounts by that editor are likely to also be locked) Editor (that could manage one or several accounts)
Consequence Unability to login and invalidation of current login sessions. Revocation of some or all privileges at all Wikimedia projects. This could focus on editing privileges, although bans are usually enforced with a lock.
Procedure A formal report is submitted via the Steward Requests/Global noticeboard or the IRC channel. Then a steward examines whether the behavior of the suspected account matches established criteria such as cross-wiki vandalism, spam, or a proven long-term abuse. Finally, if the investigation shows evidence of such behaviors, they apply a global lock. A Requests for Comment process for a global ban is initiated on Meta by a valid nominatior. If all criteria for a global ban are met and consensus to ban is reached after discussion (including the reported editor), the RfC is closed and a steward applies a global ban.

In the case of global locks, matches are sometimes made explicit through AbuseFilter filters as they speed up the investigation and represent an effective way to formalize the experience and judgment of stewards. Also, as spambots usually add links promoting a product or hosting malware, the SpamBlacklist extension can be used to prevent future edits containing unwanted keywords or URLs from specific domains.

Another key characteristic of spambots is that they often create a large numbers of accounts. As a consequence, stewards need to examine spambot patterns in other accounts associated with the IPs of the suspected one. The comparison of the behavior of the reported account and the associated accounts is performed qualitatively using the CheckUser extension. As found in a previous investigation on spambot block workflow by Claudia Lo, this has been found the most complex and time-consuming task of stewards because:

  1. they need to temporarily grant themselves with CheckUser permissions wiki by wiki,
  2. there is no systematized comparison tool of spambot patterns.

To support stewards in spambot detection, this project will examine computational approaches that will speed-up this critical content moderation process.

Methodology

edit

A literature review of about a dozen research papers on spam detection (Wikipedia and social media) has been conducted to identify relevant features in the state of the art. The simplest approach for spambot detection challenge would be to build a supervised binary classifier based on similar features that predicts whether any editor in the Wikimedia ecosystem is a spambot. In fact, this is the common approach among the state of the art models that were reviewed. However, I should note that false positives in this models approach have a major impact on community health and sustainability, as they attribute a malicious nature to benign editors. For that reason, the approach will address the spambot detection challenge focusing not on editors but on links added to Wikimedia projects. That is, modeling approaches will aim to identify whether a given input tuple (URL, revision, Wikimedia project) is a spambot activity.

In addition, the study design presented in this paper is inspired by core principles from a recent research work for sockpuppet detection in Wikimedia projects:

  • Simplicity and interpretability
  • Minimize language-specific features
  • Machine-in-the-loop (support existing human processes)
  • Balance risks

Datasets

edit

I first collected a dataset of editors who are globally locked in the Wikimedia ecosystem. This was done through examining logs at MetaWiki, the central site for coordinating all Wikimedia projects and communities. In particular, I found over 600K logs of global locks to editors since March 2009 until December 2021. As shown in the figure below, spam-only account: spambot is the most common moderation message provided by stewards in global locks (over 300K), followed by other forms of misbehaviour such as long-term abuse, cross-wiki abuse, vandalism or abusive user name.

 

Then, I created a dataset of the editors globally locked with a message matching the regular expression .*[Ss]pambot.* and still remained globally locked, as some editors were set locked and then unlocked, even multiple times. This dataset comprises 301,739 spambots who created 902,242 local accounts Wikimedia projects. The figure below shows the distribution of spambots by the year of creation of their Wikimedia account. I found an increasing trend from 2016 to 2019. Since then, the number of editors globally locked as spambots has decreased. This could be the result of different causes, such as a lower presence of spambots in the Wikimedia ecosystem in recent years, improved techniques so that they became more difficult to detect, or fewer resources/efforts devoted to spambot detection.

 

The inspection of the spambots' local accounts has revealed that most of them have 0 edits. In fact, very few of these accounts have more than 1 edit. Furthermore, only ~44k of the ~300K spambots in the dataset had at least 1 edit (considering all their local accounts) and only ~1.5K spambots had edits in more than one Wikimedia project.

 

The above results were surprising since this project is motivated by the expected large presence of cross-wiki spambots requiring substantial efforts from stewards. The findings have been shared with current and former stewards who provided the following explanations:

  • Spambots usually make very few edits and rarely cross-wiki edits. They tend to create multiple accounts from a single proxy IP address (many of them are never used) and each account share spam (e.g., links) on a wiki of interest. In theory, visitors to Wikipedia using the same IP address can create up to 6 accounts per day.
  • Many spambots have no edits because their edit attempts hit either AbuseFilter or Spamblacklist
  • Many spambots are not Wikimedia focused and don’t try very hard to succeed. They just spread spam such as links and if only one percentage of one million links is published, that is already a lot of spam.

For the analysis of URLs, I have initial collected a dataset of URLs in revision of Wikimedia projects from different sources:

  • spambot_diff: diffs from visible revisions by editors globally locked as spambots.
  • spambot_deleted: edit summaries from deleted revisions of pages that were later deleted by an admin.

I carefully inspected samples of the dataset and found that some revisions did not resemble spambot activity. Therefore, I have filtered the dataset to only keep tuples (URL, revision, project) following multiple criteria. First, URL domain are neither .org gov or .edu, nor included in a curated black list of known quality websites (e.g., bbc.com, bbcamerica.com, tribunnews.com, gov.uk, etc.). Second, revisions correspond to a page in a Wikipedia language edition as I found that several URLs added to projects like Wikivoyage or Wikinews were often not spam related. Third, spambots have less than 5 edits. This threshold was set after observing editors with remarkable editing activity who were blocked globally as spambots, while having been blocked locally on specific Wikimedia projects previously because of other forms of misbehavior such as vandalism or sockpuppeting.

After this filtering process, the spambot_diff and the spambot_deleted datasets contains 1928 and 5718 entries respectively. A control dataset of URLs added to Wikimedia project was created, including the same filtering process.

Modeling

edit

Based on the review of the state of the art, I considered different features based on the URL, the revision, and the editor (spambot). I should recall that all the features are language independent, so that the models can be applied in this dataset of activity across multiple Wikipedia language editions.

  • URL-based features: URL length, and number of subdomain levels (e.g., en.wikipedia.org is 3).
  • Editor-based features: user name length, count and ratio of digits in the user name, count of leading digits in the user name, count of trailing digits in the user name, ratio of unique characters in the user name, hours since account creation, hours since first and previous edit of the user (value is 0 for first edits), user edit count.
  • Revision-based features: revision length diff, wether namespace is content, whether revision minor, and whether this revision reverts other revisions, hours since page creation and hours since previous edit on the page.

With this list of features I built four supervised classification models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN).

Results

edit

I have tested the models for spambot activity across Wikipedia language projects with the two datasets. Precision, recall, and F1-scores were considered to evaluate the performance of the the models performing 10-fold cross validation (score values are averaged).

spambot_deleted spambot_diff
Precision Recall F1 Precision Recall F1
LR 0.82 0.95 0.88 0.83 0.79 0.81
RF 0.81 0.96 0.88 0.81 0.87 0.84
SVM 0.82 0.96 0.89 0.86 0.81 0.83
KNN 0.81 0.90 0.85 0.82 0.85 0.83

For spambot_deleted, F1-scores of the four models is above 0.85 with SVM performing the best as it achieved an F1-score of 0.887. In terms of precision, it is remarkable the value of 0.795 for RF. I created the confusion matrices of the four models with a train/test split of 0.85/0.15 and found a non negligible rate of false positives. If any of these models were deployed with a machine-in-the-loop approach to support stewards in global locks, this result would imply that, although more spambot activity would be detected, stewards would also have to spend more resources in examining misclassified benign activity.

 

To better understand this model, I conducted a feature importance analysis of the Random Forest model. Results of the analysis are presented in the figure below. Most of the top important features are the ones with a temporal sense. I inspected the values of these features and built boxplots of their value distribution. In comparison to entries of benign editors, spambot_deleted typically correspond to the first revision of the page and editor. That is, spambot activity in this dataset across Wikipedia language editions can be largely characterized as URLs in the first revision of a new editor.

 
 

For spambot_diff, results are similar. I have discussed with stewards the reasons of having those spambot revisions not deleted. Two possible explanations are (1) no one noticed that the links were added and mediawiki doesn’t require a confirmation for the edits when the user was locked, and (2) someone decided that for some reason that revision with a link was actually valuable for the article.

Finally, the dataset has been extended to retrieve the non-public content of all the deleted revisions by spambots via API calls https://www.mediawiki.org/wiki/API:Deletedrevisions, together with additional public metadata. This new dataset has been available to the researchers working on the new generation of ML models to support patrolling and anti-vandalism tasks.