Research:Warning Templates in Huggle
This project is an experimental study measuring the effects of warning messages sent to new users by vandal fighters using the tool Huggle, specifically investigating whether messages with personalized greetings, teaching messages, and images are better at retaining good-faith editors and inhibiting bad-faith editors.
In Huggle, vandal fighters are given a queue of edits made by users, and are given the option to revert the edit and send the user a templated warning message. The default warning message for a user who has not previously received a warning was modified across three variables -- personalized, teaching, and images -- for a total of eight different messages. The default configuration of Huggle was changed so that when a vandal fighter using Huggle left a default warning, it would randomly leave one of our eight warnings.
Process
editPrevious research has suggested that new users are receiving substantially fewer personalized messages since 2006, and that there may be creeping institutionalization of Wikipedia's administrators and vandal fighting corps. Personalized messages are intended to emphasize that the user who reverted them is a volunteer who individually made a determination that their edit would be rejected -- this may lead to higher retention of good-faith contributors whose edits were reverted, but also may lead to increased vandalism by malicious users. Alternatively, vandals might be shamed into stopping their actions and even converted into good-faith contributors after a friendly, personalized introduction.
Methodology
editWith assistance from the Huggle developers, in particular User:Addshore, we developed a template that would randomly substitute one of eight different templates, based on a modulus function applied to the current second: w:Template:Uw-test-rand1. The global Huggle configuration was updated so that when a user clicked the standard "revert and warn" button in Huggle, the software would insert this template instead of the default (w:Template:Uw-warning1. In order to keep track of which users were receiving which messages, we polled the templatelinks table in the enwiki database every 15 seconds, as each of the messages had a differnt blank z-template. In order to determine which users actually read their messages, we also polled the user_newtalk table in the enwiki database every 15 seconds -- the table that keeps track of whether a user has a new message, which is used by MediaWiki to display the yellow 'new message' banners.
Analysis
editWe analyzed the data based on a number of outcome metrics. Whether or not the warned user continues to edit is a mixed measure that is poor for determining the success of each template category. This is because we want good-faith contributors to continue editing, while we do not want bad-faith contributors to keep vandalizing. Furthermore, we certainly do not want good-faith contributors to begin vandalizing after being reverted and warned, while we definitely want vandals to 'convert' into good-faith contributors. Because of these varied possibilities, we qualitatively analyzed the contributions of each editor before and after they received each message.
Qualitative Coding
editTwo human coders reviewed the revision history of each warned editor, both before and after they received the warning message. A four-point ordinal scale was used to determine the quality of each user's edits. These levels roughly corresponded to actions relevant to vandal fighters and administered. Editors with a score of 1, for example, are blatant vandals whose actions merit at least a temporary block, while editors with a score of 4 are good-faith contributors whose edits should not have been reverted as vandalism. The scoring system is described below, with examples of content. In addition, for users who edited after they received a message, the coders indicated if the warned user contacted the editor who reverted them, and if this contact was retaliatory or not. The two coders independently rated the contributions of all the editors in this sample, with a 10% overlap to test for intercoder reliability -- which was high, with a Krippendorff's alpha (ordinal) value of 0.804. To ensure the consistency of ratings, before and after edits were coded separately and independently.
score | criteria | examples | cases |
1 | blatant or sustained vandals, likely block | racial slurs; personal attacks against editors; libelous attacks against article subjects; obscene images; digit transposing; violations of 3RR | 1, 2, 3 |
2 | vandals who damage articles, but not yet block-worthy | absurd/joke edits (trying to be funny); removing content from articles; non-libelous attacks against article subjects | 1, 2, 3 |
3 | test edits and good faith that should be reverted | keyboard mashing; POV pushing; inserting patently non-notable facts; vandals who immediately revert their own edits | 1, 2, 3, 4 |
4 | good faith edits that should not be completely reverted | edits slightly out of policy (e.g. correct statement, but no references added); string of good edits with an obvious unintentional mistake | 1, 2, 3, 4 |
The ratings of edits made before the message was read were then used to identify three populations of users: blatant vandals (score < 2), test editors (score between 2 and 3, inclusive), and good workers (score > 3).
Outcome metrics
editOutcome measures were calculated as follows:
- Stays: Does the editor continue to edit in any namespace after being warned?
- Contact: Does the editor attempt to contact the user who warned them? Note: Users who ask questions directed to the reverting editor in any space or manner were flagged as attempting to contact -- this includes inserting the text "Why did my edit get reverted?" on their own talk page, the article's talk page, an edit summary, or even the article text itself.
- Good contact: If the editor does contact the user who warned them, is this contact non-retaliatory? Retaliation includes vandalizing a user's talk or user page, but does not include asking why they were reverted on the reverter's user page. Coders did not attempt to determine whether blatant vandals were making facetious or insincere requests, although repeated messaging that bordered on harassment was classified as retaliation.
- Improves: If the user continued to edit after reading the message, is the quality rating for the user's edits after to the message greater than the rating for the user's edits before the message? Note that in the cases of 'good workers' who began with highly-rated contributions, it is nearly-impossible for them to improve.
- Good outcome: A combined metric of the above outcomes. If the user was a blatant vandal, improving, not staying, or good contact is considered a good outcome. If the user was a good worker, continuing to edit or good contact is considered a good outcome.
Variables added and removed
editRegression tables are not included for image/no image pairs, because no outcome metric was even marginally statistically significant in any direction. Messages with and without images were merged into the same category, reducing the number of independent variables to two: teaching and/or personalized. In addition, three potentially extraneous variables were included in the regression: whether or not the user was a registered user, the number of edits to non-talk pages made before receiving a warning, and the number of edits to talk pages made before receiving a warning.
Messages tested
editMessages tested |
---|
default - imageeditWelcome to Wikipedia. Although everyone is welcome to contribute to Wikipedia, at least one of your recent edits, such as the one you made to Science with this edit has been reverted, as it appears to be unconstructive. Use the sandbox for testing; if you believe the edit was constructive, ensure that you provide an informative edit summary. You may also wish to read the introduction to editing. Thank you. StuGeiger (talk) 19:40, 18 July 2011 (UTC) default - no imageeditWelcome to Wikipedia. Although everyone is welcome to contribute to Wikipedia, at least one of your recent edits, such as the one you made to Science with this edit has been reverted, as it appears to be unconstructive. Use the sandbox for testing; if you believe the edit was constructive, ensure that you provide an informative edit summary. You may also wish to read the introduction to editing. Thank you. StuGeiger (talk) 19:40, 18 July 2011 (UTC) personalized - no imageeditHello, and welcome to Wikipedia! I edit Wikipedia too, under the username StuGeiger. I noticed that one of your recent edits, such as the one you made to Science with this edit, appeared to be unconstructive, and I’ve reverted it. In the future, please use the sandbox for testing and be sure to provide an informative edit summary. You may also wish to read the introduction to editing. Please feel free to ask me questions about editing Wikipedia (or anything else) on my talk page. StuGeiger (talk) 19:11, 18 July 2011 (UTC) personalized - imageeditteaching - no imageeditWelcome to Wikipedia. Although everyone is welcome to contribute to Wikipedia, at least one of your recent edits, such as the one you made to Science with this edit, has been reverted, as it appears to be unconstructive. For constructive edits, always provide an informative edit summary so that editors have a brief description of your intentions. If you want to try out how editing Wikipedia works, please edit the sandbox page and not encyclopedia articles. If you have questions about editing Wikipedia, you might want to take a look at this tutorial. You may also wish to read the introduction to editing. Thank you. StuGeiger (talk) 19:25, 18 July 2011 (UTC) teaching - imageeditteaching and personalized - no imageeditWelcome to Wikipedia! I edit Wikipedia too, under the username StuGeiger. Wikipedia is an all-volunteer operation and I am one of the many volunteers here who watch for unconstructive edits. Everyone is welcome to contribute to Wikipedia, but I noticed that one of your recent edits, such as the one you made to Science with this edit, appeared to be unconstructive. I have reverted it and ask that in the future you please use the sandbox for test edits, not encyclopedia articles. For constructive edits, always provide an informative edit summary so that other editors like me have a brief description of your intentions. If you have questions about editing Wikipedia, you might want to take a look at this tutorial. Also, feel free to ask me questions about editing Wikipedia (or anything else) on my talk page. StuGeiger (talk) 19:12, 18 July 2011 (UTC) teaching and personalized - with imageedit |
Results and discussion
editNote that the following font weights are used to indicate levels of significance:
- Bold and italics: Substantial significance (p-value less than .001)
- Bold only: Standard statistical significance (p-value between .001 and .05)
- Italics only: Marginal significance (p-value between .05 and .15)
Improves
editDoes the editor improve the quality of their work?
Coefficients: | blatant vandals | test editors | good workers | |||||||||
Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | -0.068 | 0.175 | 0.698 | 0.135 | 0.0784 | 0.085 | marginally | -0.011 | 0.083 | 0.892 | ||
is anonymous | 0.086 | 0.157 | 0.585 | 0.050 | 0.067 | 0.456 | 0.079 | 0.073 | 0.279 | |||
# of edits before message (non-talk namespaces) | 0.030 | 0.008 | <0.001 | substantially | 0.007 | 0.003 | 0.042 | significant | -0.0007 | 0.001 | 0.655 | |
# of edits before message (talk namespaces) | -0.00008 | 0.0390 | 0.998 | 0.007 | 0.017 | 0.700 | 0.004 | 0.009 | 0.591 | |||
teaching message | 0.186 | 0.154 | 0.232 | 0.031 | 0.073 | 0.667 | -0.040 | 0.0860 | 0.641 | |||
personal message | 0.129 | 0.122 | 0.296 | 0.020 | 0.074 | 0.787 | -0.00003 | 0.070 | 1.000 | |||
teaching and personal message | -0.142 | 0.203 | 0.486 | -0.025 | 0.0999 | 0.799 | 0.066 | 0.1162 | 0.569 |
- The larger number of edits an editor performs before receiving a first-level warning, the more likely they are to improve if they are in the "blatant vandals" and "test editors" groups.
- Note: due to the metrics we used, it is near-impossible for editors to be both classified "good workers" and improve, because good workers were already at the top of the edit quality scale.
Contact
editDid the reverted editor try to contact the huggler at all?
Coefficients: | blatant vandals | test editors | good workers | |||||||||
Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 0.261 | 0.147 | 0.080 | marginally | 0.131 | 0.057 | 0.023 | significant | 0.326 | 0.146 | 0.029 | significant |
is anonymous | -0.141 | 0.130 | 0.281 | -0.056 | 0.049 | 0.254 | -0.267 | 0.127 | 0.040 | significant | ||
# of edits before message (non-talk namespaces) | -0.005 | 0.007 | 0.445 | -0.002 | 0.003 | 0.439 | 0.004 | 0.003 | 0.189 | |||
# of edits before message (talk namespaces) | 0.112 | 0.034 | 0.002 | substantially | 0.106 | 0.013 | < .0001 | substantially | 0.040 | 0.016 | 0.014 | significant |
teaching message | -0.228 | 0.136 | 0.098 | marginally | -0.004 | 0.054 | 0.946 | -0.175 | 0.149 | 0.246 | ||
personal message | 0.062 | 0.105 | 0.559 | 0.070 | 0.055 | 0.202 | 0.101 | 0.122 | 0.412 | |||
teaching and personal message | 0.317 | 0.176 | 0.077 | marginally | -0.061 | 0.074 | 0.408 | 0.314 | 0.201 | 0.124 |
- Across all messages, anonymous editors who are reverted for making quality edits are significantly less likely to contact the reverting Huggler.
- Reverted editors who are not blatant vandals are more likely to contact the reverting editor if the message posting was personal.
- This makes sense. A personal message posting is more inviting and reminds the reverted editor that the revert operation was actually performed by a person.
- It's interesting that the probability of contact with the reverting editor does not increase for blatant vandals. This is probably a good thing for hugglers.
Good contact
editFor those reverted editors try to make contact, is the contact good (i.e not bad).
Coefficients: | blatant vandals | test editors | good workers | |||||||||
Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 0.360 | 0.278 | 0.232 | 0.529 | 0.242 | 0.036 | significant | 0.885 | 0.229 | 0.003 | substantially | |
is anonymous | 0.039 | 0.222 | 0.866 | 0.096 | 0.193 | 0.624 | -0.281 | 0.192 | 0.172 | |||
# of edits before message (non-talk namespaces) | -0.024 | 0.050 | 0.645 | 0.010 | 0.026 | 0.700 | 0.005 | 0.004 | 0.220 | |||
# of edits before message (talk namespaces) | -0.019 | 0.054 | 0.736 | 0.045 | 0.032 | 0.167 | 0.004 | 0.016 | 0.806 | |||
teaching message | -0.999 | 0.187 | <0.001 | substantially | -0.076 | 0.275 | 0.784 | 0.159 | 0.187 | 0.414 | ||
personal message | 0.681 | 0.217 | 0.014 | significant | -0.149 | 0.254 | 0.562 | 0.001 | 0.219 | 0.997 | ||
teaching and personal message | NA | NA | NA | -0.129 | 0.353 | 0.716 | NA | NA | NA |
Note: This regression is only over reverting edits that resulted in the reverted editor contacting the huggler who reverted him/her.
- For blatant vandals (unlikely) the probability of having a negative contact is increased for the teaching message, but decreased for a personal message.
- One theory for the decrease of negative contact in the case of the personal message is the embarrassment of the vandal. A vandal might be less excited about vandalizing when they realize that they are causing another human being trouble.
- It's interesting that the teaching message without the personalized component predicts an increase in the probability of negative contact.
Stay
editWill the reverted editor perform any edits after reading their message?
Coefficients: | blatant vandals | test editors | good workers | |||||||||
Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 0.288 | 0.125 | 0.023 | significant | 0.425 | 0.058 | < .0001 | substantially | 0.741 | 0.158 | < .001 | substantially |
is anonymous | 0.142 | 0.107 | 0.186 | -0.216 | 0.052 | < .0001 | substantially | -0.268 | 0.139 | 0.056 | marginally | |
# of edits before message (non-talk namespaces) | 0.019 | 0.008 | 0.018 | significant | 0.013 | 0.003 | < .0001 | substantially | 0.008 | 0.003 | 0.011 | significant |
# of edits before message (talk namespaces) | 0.167 | 0.040 | < .0001 | substantially | 0.119 | 0.018 | < .0001 | substantially | 0.024 | 0.019 | 0.207 | |
teaching message | -0.227 | 0.106 | 0.034 | significant | 0.076 | 0.043 | 0.082 | marginally | -0.146 | 0.130 | 0.264 | |
personal message | -0.050 | 0.098 | 0.611 | 0.071 | 0.045 | 0.115 | 0.013 | 0.118 | 0.915 | |||
teaching and personal message | 0.097 | 0.144 | 0.501 | -0.076 | 0.062 | 0.219 | 0.046 | 0.178 | 0.797 |
- In all cases, the number of edits an editor performs before reading the message is a positive predictor of performing more edits.
- The teaching message predicted a decrease in the probability of a blatant vandal sticking around to perform more edits after reading the message.
- This is probably a good thing as getting rid of blatant vandals should be more efficient than reverting them again.
- We found marginal significance that the teaching and personal messages increase the probability that an editor would continue to edit. This is probably a good thing, but it should be taken with a grain of salt due to its marginal significance.
- In the case of the personal and teaching message, the effects of personal and teach appear to not sum up, but rather decrease slightly. It appears that it could be better to have either personal or teaching elements in the message rather than both.
Overall good outcome
editCoefficients: | blatant vandals | test editors | good workers | |||||||||
Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | Estimate | std.error | p-value | significant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 0.649 | 0.124 | < .0001 | substantially | 0.404 | 0.063 | < .0001 | substantially | 0.597 | 0.159 | < .001 | substantially |
is anonymous | -0.128 | 0.106 | 0.229 | 0.058 | 0.058 | 0.316 | -0.253 | 0.140 | 0.073 | marginally | ||
# of edits before message (non-talk namespaces) | 0.003 | 0.008 | 0.676 | 0.003 | 0.004 | 0.452 | 0.009 | 0.003 | 0.007 | substantially | ||
# of edits before message (talk namespaces) | -0.118 | 0.040 | 0.004 | substantially | 0.010 | 0.020 | 0.608 | 0.023 | 0.019 | 0.231 | ||
teaching message | 0.260 | 0.105 | 0.014 | significant | -0.016 | 0.048 | 0.740 | -0.064 | 0.131 | 0.626 | ||
personal message | 0.146 | 0.097 | 0.135 | 0.001 | 0.049 | 0.981 | 0.114 | 0.119 | 0.344 | |||
teaching and personal message | -0.204 | 0.142 | 0.152 | -0.013 | 0.068 | 0.845 | -0.007 | 0.179 | 0.967 |