Research:Modeling undisclosed paid editors
This page documents a proposed research project.
Information may be incomplete and may change before the project starts.
We seek to build an automated system for detecting sock puppets accounts related to undisclosed paid editing (UPE) in Wikipedia. In past work, we used metadata signals to track UPE activity[1]. In this study, we'll extend that work with linguistic features using deleted content of UPE activity.
Methods
editWe plan to compute a set of features describing how these accounts are behaving on Wikipedia. Examples of these features are average size of edits, average time between edits, and percentage of edits per Wikipedia namespace (e.g., Talk or User pages). Also, we will consider linguistic features that can be extracted from content of Wikipedian contributions that measure, for instance, the sentiment level of this text, or use of pronouns, punctuation, and specific keywords. Of course, in order to accurately compute these features, we need to access to the whole edit history of the considered accounts, hence we need to include deleted edits in the computation.
Timeline
editPlease provide in this section a short timeline with the main milestones and deliverables (if any) for this project.
Policy, Ethics and Human Subjects Research
editIt's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.
See also
edit- Phab:T252894 -- Unsophisticated bad actors dataset
Results
editOnce your study completes, describe the results an their implications here. Don't forget to make status=complete above when you are done.
References
edit- ↑ Joshi, N., Spezzano, F., Green, M., & Hill, E. (2020, April). Detecting Undisclosed Paid Editing in Wikipedia. In Proceedings of The Web Conference 2020 (pp. 2899-2905). https://dl.acm.org/doi/abs/10.1145/3366423.3380055