Research:Wikipedia during 2024 Elections
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
Introduction
editWikipedia is a cornerstone of the online information ecosystem, often serving as a primary source of knowledge for a wide range of topics. Its open and collaborative model allows anyone to contribute up-to-date information, making Wikipedia a key resource for understanding both historical and current affairs.
During major events such as the 2024 US Presidential Election, Wikipedia is likely to attract increasing attention from information seekers looking for comprehensive and reliable information. However, politically charged events pose substantial challenges to its model. In particular, these moments can also lead to an influx of newcomers alongside increased vandalism, testing the effectiveness of Wikipedia's content moderation techniques. Therefore, analyzing how the platform navigates the 2024 US Presidential Election is essential to understanding community efforts to incorporate content, retain novel contributors, and preserve knowledge integrity.
Background
edit2024 US Presidential Election
editThe 2024 US presidential election was held on Tuesday, November 5, 2024. The Republican ticket of Donald Trump, 45th president of the United States (2017-2021), and JD Vance, junior senator from Ohio, defeated the Democratic ticket of Kamala Harris, incumbent vice president, and Tim Walz, governor of Minnesota. In addition to the presidential elections, other elections were also held on that date for public institutions, such as the Senate, the House of Representatives, governorships, various state and local offices, as well as several ballot measures.
These elections were characterized by various events. Some of the most notable moments in the final months of the campaign included:
- June 27: Debate between presidential candidates Joe Biden and Donald Trump.
- July 13: First assassination attempt on Donald Trump at a rally in Butler (Pennsylvania).
- July 21: Announcement from President Joe Biden of his withdrawal from the race, followed by the announcement from Vice President Kamala Harris of her candidacy for president.
- August 23: Ratification of Kamala Harris as the Democratic Party's candidate.
- September 10: Debate between presidential candidates Donald Trump and Kamala Harris.
- September 15: Second assassination attempt on Donald Trump at a golf club in West Palm Beach (Florida).
- October 1: Debate between vice presidential candidates JD Vance and Tim Walz.
- November 6: The Republican ticket of Donald Trump and JD Vance is declared the winner of the election.
Prior research
editWikipedia has proven to be a generally reliable resource for political science studies, especially for recent or prominent topics[1]. As a platform where communities actively document and link historical and contemporary events[2], it is unsurprising that research methods using Wikipedia data can be as trustworthy as traditional approaches based on large-scale expert surveys[3].
A key difference between Wikipedia and most user-generated content platforms is that Wikipedia editors must collaborate to create and expand knowledge. While many editors with an interest in political topics may focus on specific areas[4], they have to interact with others holding different political views[5]. As a result, articles receiving contributions by politically diverse editors tend to be of higher quality[6].
Prior research has examined Wikipedia in the specific context of elections. Some studies have used Wikipedia data for electoral predictions[7][8][9], similar to how it has been leveraged in forecasting other outcomes, such as movie box office performance[10], drug demand[11] or global diseases[12]. Other works have explicitly focused on how US elections were covered in Wikipedia. A study of the 2016 US Presidential election revealed that patterns of information production and consumption aligned closely with major campaign events[13]. A more recent analysis of the 2024 US Presidential election also found an increase in Wikipedia edits during politically charged moments, often accompanied by an increased risk of misinformation[14].
Wikipedia, like any platform of the online information ecosystem, faces the risk of misinformation[15][16]. Although the growing complexity of threats and the rising number of attacks can pose significant challenges for patrollers, Wikipedia’s content moderation model is said to be generally effective in well-resourced communities[17]. Reverts is one of the most well-known techniques to moderate content in Wikipedia, allowing editors to undo problematic revisions. Several systems have been created to automatically identify content to revert[18][19], some of them currently used by Wikimedia communities[20]. However, it is worth noting that reverts can be highly discouraging for newcomers, impacting their willingness to keep editing (i.e., “don’t bite the newbies” [21]). Page protection is another core content moderation technique, which historically received fewer attention from research[22]. Recent studies have revealed that protecting pages has complex consequences, including differences in how quality is affected[23] and other unintended effects because of the concentration of contributors[24].
Hypotheses
editA list of hypotheses were proposed for this analysis. WMF staff from different teams were invited to share individual hypotheses indicating whether the hypothesis refers to readers or collaborators (in general or more specifically), is supported by some observation, and could help the work of the WMF. As dozens of hypotheses were collected, a prioritization process was carried out using the following criteria:
- Data availability: Distinction between hypotheses that only require data from the Data Lake, MariaDB replicas or APIs, and those that require additional data.
- Complexity: Low by default, unless the analysis involves working with very large tables such as webrequests (medium) or causal inference (hard).
- Impact: Distinction between hypotheses with potential interest for all WMF audiences (high) and those that are relevant but perhaps not for all audiences (medium).
The prioritized hypotheses are presented in Table 1.
# | Hypothesis statement | Additional context |
---|---|---|
H1 | In the leadup to the election an increasing number of Wikipedia pages on political figures and topics are protected. | Presumably readership will increase on core topics as we get closer to the election, and along with it an increase in poor contributions, leading to page protection. |
H2 | In the leadup to the election an increasing number of IP edits and newcomer edits are reverted. | As people are incentivised to promote or obstruct a candidate, more newcomers are making politically biased edits (or edits that violate NPOV), and thus more newcomer edits are reverted. |
H3 | During the US election campaign, low-quality edits in relevant articles are quickly moderated | The main strength of Wikipedia is not only that there are great barriers to misinformation occurring, but also effective methods to remove it quickly. |
H4 | Due to people going to the polls or being on the move, the increase we see in traffic will primarily be from mobile devices. | Increase in proportion of mobile pageviews during election day as compared to Desktop pageviews. |
H5 | The 30-day newcomer retention rate for users who edit "election-related articles" will be significantly lower relative to the average newcomer retention rate. | One might assume that the majority of new edits will come from 1-time editors, people making "low-quality edits" and the overburden on moderators will result in less "assistance" to make constructive edits. |
H6 | Registration rates increase sharply from the date of the election (and remain elevated until inauguration). | It may be useful to geographically constrain this to the US and/or compare registration rates in India during the 2024 Indian Parliamentary elections |
Dataset
editTo address hypotheses H1, H2, H3, and H5, it is required to identify election-related articles from English Wikipedia. For this purpose, a dataset was created using two specific templates (Template:2024 United States presidential election and Template:2024 United States elections), as well as the lists of United States senators and members of the United States House of Representatives as of November 28. The resulting dataset comprises 1,136 election-related articles. Leveraging the table-based structure of these templates and lists, additional information such as category, party, and state was incorporated (see distributions in Figure 1).
-
a) Distribution of election-related articles by category.
-
b) Distribution of election-related articles by party.
-
c) Distribution of election-related articles by state.
Analysis
editHypothesis 1
editIn the leadup to the election an increasing number of Wikipedia pages on political figures and topics are protected.
In some circumstances involving an increase of disruptive edits (e.g., vandalism, edit warring), it is found necessary to restrict editing on certain pages to specific groups of editors. While Wikipedia offers various types and levels of protection, this analysis simplifies this content moderation technique by considering a page as protected if any form of active protection is in place.
Figure 2 (left) shows the daily number of protected election-related articles since 2021, highlighting an increasing trend as the election approached. Similarly, Figure 2 (right) presents the number of protected election-related articles since January 2024, with values ranging between 80 and 90. The global maximum of 90 protected articles occurred on the day the winner was announced, although this value is also observed on other days in June, July, and August.
-
a) 2021-2024.
-
b) 2024.
From 2021, articles from all major parties have received protection at some point: 105 of the Republican Party, 60 of the Democratic Party, 2 of the Libertarian Party, and 2 of the Green Party. Among the 29 articles that have always remained under some level of protection during this period, 15 were related to the Democratic Party (Democratic Party, Kamala Harris, Joe Biden, Tammy Duckworth, Elizabeth Warren, Cory Booker, Chuck Schumer, Kirsten Gillibrand, Tim Kaine, Nancy Pelosi, Eric Swalwell, Debbie Wasserman Schultz, Hank Johnson, Rashida Tlaib, Alexandria Ocasio-Cortez, [1]), 12 to the Republican Party (Republican Party, Donald Trump, Mike Pence, Marco Rubio, Chuck Grassley, Joni Ernst, Mitch McConnell, Rand Paul, Ted Cruz, Mitt Romney, Darrell Issa, Matt Gaetz), 1 to the Green Party (Jill Stein), and 1 to an independent (Robert F. Kennedy Jr)[nb 1] .
Hypothesis 2
editIn the leadup to the election an increasing number of IP edits and newcomer edits are reverted.
For this hypothesis, edits made in 2024 to election-related articles are analysed using MediaWiki history dataset in order to identify IP edits and newcomer edits (those made within 30 days of a user account's creation). Figure 3 shows an upward trend in the number of monthly edits across all categories: all edits (top), IP edits (middle), and newcomer edits (bottom). Besides the last month, high activity is observed in the summer of 2024, coinciding with two major events: the first assassination attempt on Donald Trump and President Joe Biden's announcement of his withdrawal, which led to Kamala Harris's ratification.
Interestingly, the monthly revert ratio across all categories decreases over time. While some recent edits may still be subject to reversion in the coming months, the overall trend across all categories has been declining since 2021. That is, although IP and newcomer edits increase in the leadup to the election, a decreasing rate of these edits are being reverted.
-
a) Count of all edits.
-
a) Count of IP edits.
-
a) Count of newcomer edits.
-
d) Revert ratio of all edits.
-
e) Revert ratio of IP edits.
-
f) Revert ratio of newcomer edits.
Hypothesis 3
editDuring the US election campaign, low-quality edits in relevant articles are quickly moderated
A dataset comprising 148,924 edits to election-related articles made between January 2024 and the announcement day was compiled. 15,556 of them (10%) were reverted[nb 2]. The median time to revert them is between 2-3 hours (Q1: 520.75 seconds; Q2: 8,215 seconds; Q3: 543,849.5 seconds).
Edits to Wikipedia articles can serve various purposes, such as expanding content, enhancing structure, adding references or multimedia, or correcting typographical errors. Given this diversity, an edit is considered low-quality based on its probability of being reverted. In particular, it is used the multilingual revert risk model, which has been shown to perform better than the language-agnostic version, with an arbitrary threshold of 0.95. The threshold is deliberately set high to prioritize precision in detecting low-quality revisions, even if it means overlooking many revisions that could potentially be reverted[nb 3]. Out of the 148,924 revisions, 798 (0.5%) were identified as low-quality. Among these, 615 (77%) were reverted, with a median revert time of under 2 minutes (Q1: 32.5 seconds; Q2: 112 seconds; Q3: 978 seconds).
Again, it is important to note that some recent edits may still be subject to reversion in the coming months. Also, among edits that were not reverted, some were later modified by subsequent edits that partially or entirely removed their content. The articles with the highest number of low-quality edits were List of Republicans who opposed the Donald Trump 2024 presidential campaign (33), 2024 United States Senate election in Texas (18), Jasmine Crockett (13), List of Donald Trump 2024 presidential campaign endorsements (13), 2024 United States elections (12), 2024 Democratic Party presidential primaries (11), 2024 United States presidential election in Pennsylvania (11), Cenk Uygur (10), Nationwide opinion polling for the 2024 United States presidential election (10), and Clay Higgins (10).
Hypothesis 4
editDue to people going to the polls or being on the move, the increase we see in traffic will primarily be from mobile devices.
Recent research has revealed differences in Wikipedia browsing habits between desktop and mobile devices: both access methods follow similar usage patterns throughout the day, but mobile usage rises significantly in the evening[25].
To investigate whether mobile usage increased on election day, Figure 4 presents time series of pageviews (EST timezone) on English Wikipedia from IP addresses based on the United States. Figure 4a depicts the number of pageviews by access method. The data supports prior research showing that mobile usage rises in the evening, desktop access is favored on weekday mornings, and mobile access is predominant on weekend mornings. No traffic increase on November 5 is shown, although the mobile web traffic increased on November 6. These trends are further emphasized in Figure 4b, which shows the hourly percentage of pageviews from mobile web devices during the first two weeks of November 2024, (weekend days are marked by dashed lines). November 6 stands out as the only day with a preference for mobile access in the morning.
-
a) Pageviews by access methods.
-
b) % of pageviews from mobile web devices.
The most visited pages in English Wikipedia on November 6 between 6am and 8am (EST) were Main Page (246,126) 2024 United States presidential election (75,067), 2020 United States presidential election (74,331), Donald Trump (68,221), 2016 United States presidential election (34,327), 2020 United States elections (24,871), File:Reichstagsbrand.jpg (24,242), Project 2025 (23,906), JD Vance (22,621), and 2024 United States elections (21,829). These specific pages suggest that the anomalous peak of information consumption from mobile devices on a weekday morning was driven by the announcement of the election winner.
Hypothesis 5
editThe 30-day newcomer retention rate for users who edit "election-related articles" will be significantly lower relative to the average newcomer retention rate.
The success of open collaboration projects has been found to be highly correlated with the number of contributors they attract and retain[26]. Editor retention in Wikipedia is periodically measured by the Movement Insights team as the proportion of editors who, having made at least 1 edit in the first 30 days after registration, made at least 1 edit during the second 30 days.
For this hypothesis, only editors having made at least 1 edit to an election-related article in the first 30 days after registration are considered. Figure 5 shows the retention rate for every month since 2021 with values ranging between 0.14 and 0.43, notably larger than those reported for general editors (e.g., 0.08 on English Wikipedia in January 2024 according to the wiki comparison dashboard).
To gain insights into newcomers making at least one edit to an election-related article within their first 30 days after registration, two statistics are examined. Figure 6a presents boxplots comparing the number of revisions made during the first and second months. As expected, the number of revisions in the second month is remarkably lower. Interestingly, the analysis of the revert ratio for first-month edits shown in Figure 6b reveals a lower ratio among editors who continued contributing in the second month (retained editors). This trend may be influenced by various factors, e.g., retained editors developed a better understanding of Wikipedia’s editing norms, or the possibility that reverts discouraged newcomers from continuing to edit (“don’t bite the newbies”).
-
a) Revision count.
-
b) Revert ratio in 1st month revisions.
Figure 6: Statistics of newcomers since 2021 making at least 1 edit to an election-related article in the first 30 days after registration.
To provide context, a comparison with all newcomers since 2021 is shown in Figure 7. Those who edit an election-related article within their first month tend to make more edits during both their first and second months, indicating higher motivation for Wikipedia editing.
-
a) Revision count in the first month.
-
b) Revision count in the first month.
Hypothesis 6
editRegistration rates increase sharply from the date of the election (and remain elevated until inauguration).
For this hypothesis, Figure 8 illustrates the daily count of newly registered accounts on English Wikipedia from January 2024 until Inauguration Day. The data shows no evidence of a sharp increase in registration rates following the election date. The only notable spike occurred on January 14, with 9,183 new accounts. Of these, 6,500 were globally locked because of abuse, implying that this sudden increase was the result of a coordinated effort to disrupt Wikipedia's editing activity.
Discussion
editThis analysis examined various hypotheses concerning the 2024 US Presidential Election on English Wikipedia. The main findings are:
- An increasing number of Wikipedia election-related pages were protected as the election approached.
- Although IP and newcomer edits increase in the leadup to the election, a decreasing rate of these edits are being reverted.
- The sample of edits identified as low-quality was mostly reverted in a few minutes.
- There was no increase in traffic observed on election day. However, the next day, when the election winner was announced, showed an unusual preference for mobile access in the morning.
- Newcomers who edited election-related articles had higher retention rate than regular newcomers.
- No evidence of a sharp increase in registration rates following the election date.
While hypotheses have been studied individually, some are likely interrelated. In particular, the increased protections on Wikipedia articles – preventing potentially disruptive edits on pages of great interest for abusive editors – may have contributed to the decline in reverts to IP and newcomers edits. A remarkable finding is the rapid reversion of low-quality edits, often within minutes, as it reflects the effectiveness of community-driven content moderation on Wikipedia. The retention rate of newcomers editing election-related articles is also noteworthy. Given this observation, it is worth considering future work examining the retention rates of newcomers editing articles on other (non-political) events and topics. Such a comparison could help identify which areas and types of content are more effective in retaining new editors.
Last but not least, some limitations of this analysis should be taken into account:
- Wikipedia is multilingual: The English Wikipedia is one of more than 300 language editions of Wikipedia available for research[27]. While it is the largest and most widely used edition, especially in the United States, other editions (e.g., Spanish Wikipedia) could also be highly relevant for this specific case study. Furthermore, expanding the current analysis to include other language editions would help determine whether these findings are consistent across different projects, including smaller ones with fewer resources for patrolling[28].
- Many content moderation techniques exist: Page protections and reverts are two essential content moderation techniques on Wikipedia. Nevertheless, moderators employ a wide range of other mechanisms to maintain the integrity of content, including tools like AbuseFilter, PageTriage, and FlaggedRevs. Building on ongoing efforts to develop a working definition for moderation activity and moderator, future work could assess the effectiveness of other content moderation techniques during the 2024 US Presidential election.
- Qualitative approaches to identifying disinformation: In this analysis, machine learning has been used to identify low-quality edits based on their probability of being reverted. While this approach effectively detects instances of vandalism, it cannot capture more subtle forms of abuse and misinformation (e..g., biased narratives, hoaxes, data voids, etc). These threats to knowledge integrity are challenging to identify through computational approaches alone. As a consequence, it is crucial to complement and contrast analyses with the qualitative efforts of the WMF’s Disinformation Response Teams (DRTs).
Acknowledgements
editWe would like to express sincere gratitude to Sam Walton, Sonja Perry, Kirsten Stoller, and Kate Zimmerman for reviewing earlier versions of this report and providing constructive feedback.
Corollary
edit2024 was a landmark year for elections, with a record-breaking number of people eligible to vote in human history also including the European Parliament election and the Indian general election. As a result, additional efforts are being made to examine some of the hypotheses for these events.
References
edit- ↑ Brown, A. R. (2011). Wikipedia as a data source for political scientists: Accuracy and completeness of coverage. PS: Political Science & Politics, 44(2), 339-343.
- ↑ Twyman, M., Keegan, B. C., & Shaw, A. (2017). Black Lives Matter in Wikipedia: Collective memory and collaboration around online social movements. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1400-1412).
- ↑ Herrmann, M., & Döring, H. (2023). Party positions from Wikipedia classifications of party ideology. Political Analysis, 31(1), 22-41.
- ↑ Agarwal, P., Redi, M., Sastry, N., Wood, E., & Blick, A. (2020). Wikipedia and Westminster: Quality and dynamics of Wikipedia pages about UK politicians. In Proceedings of the 31st ACM Conference on Hypertext and Social Media (pp. 161-166).
- ↑ Neff, J. J., Laniado, D., Kappler, K. E., Volkovich, Y., Aragón, P., & Kaltenbrunner, A. (2013). Jointly they edit: Examining the impact of community identification on political interaction in wikipedia. PloS one, 8(4), e60584.
- ↑ Shi, F., Teplitskiy, M., Duede, E., & Evans, J. A. (2019). The wisdom of polarized crowds. Nature human behaviour, 3(4), 329-336.
- ↑ Ciocirdel, G. D., & Varga, M. (2016). Election prediction based on Wikipedia pageviews.
- ↑ Smith, B. K., & Gustafson, A. (2017). Using wikipedia to predict election outcomes: online behavior as a predictor of voting. Public Opinion Quarterly, 81(3), 714-735.
- ↑ Yasseri, Taha, and Jonathan Bright. "Wikipedia traffic data and electoral prediction: towards theoretically informed models." EPJ Data Science 5 (2016): 1-15.
- ↑ Mestyán, Márton, Taha Yasseri, and János Kertész. "Early prediction of movie box office success based on Wikipedia activity big data." PloS one 8.8 (2013): e71226.
- ↑ Miller, S., El-Bahrawy, A., Dittus, M., Graham, M., & Wright, J. (2020, April). Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets. In Proceedings of the web conference 2020 (pp. 2669-2675).
- ↑ Priedhorsky, R., Osthus, D., Daughton, A. R., Moran, K. R., Generous, N., Fairchild, G., ... & Del Valle, S. Y. (2017). Measuring global disease with Wikipedia: Success, failure, and a research agenda. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1812-1834).
- ↑ Keegan, B. C. (2019). The dynamics of peer-produced political information during the 2016 US Presidential Campaign. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1-20.
- ↑ Formisano, G., Hine, E., Juneja, P., Laitila, J., Novelli, C., Chiu, E., ... & Floridi, L. (2024). Counter-Misinformation Dynamics: The Case of Wikipedia Editing Communities during the 2024 US Presidential Elections. Available at SSRN.
- ↑ Kumar, S., West, R., & Leskovec, J. (2016, April). Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In Proceedings of the 25th international conference on World Wide Web (pp. 591-602).
- ↑ Elebiary, Anis, and Giovanni Luca Ciampaglia. "The role of online attention in the supply of disinformation in Wikipedia." arXiv preprint arXiv:2302.08576 (2023).
- ↑ Saez-Trumper, D. (2019). Online disinformation and the role of wikipedia. arXiv preprint arXiv:1910.12596.
- ↑ Rzeszotarski, J., & Kittur, A. (2012). Learning from history: predicting reverted work at the word level in wikipedia. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (pp. 437-440).
- ↑ Flöck, F., Vrandečić, D., & Simperl, E. (2012). Revisiting reverts: Accurate revert detection in Wikipedia. In Proceedings of the 23rd ACM conference on Hypertext and social media (pp. 3-12).
- ↑ Trokhymovych, M., Aslam, M., Chou, A. J., Baeza-Yates, R., & Saez-Trumper, D. (2023). Fair multilingual vandalism detection system for Wikipedia. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 4981-4990)
- ↑ Halfaker, A., Kittur, A., & Riedl, J. (2011). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In Proceedings of the 7th international symposium on wikis and open collaboration (pp. 163-172).
- ↑ Hill, B. M., & Shaw, A. (2015). Page protection: another missing dimension of wikipedia research. In Proceedings of the 11th International Symposium on Open Collaboration (pp. 1-4).
- ↑ Ruprechter, T., Ribeiro, M. H., West, R., & Helic, D. (2023). Protection from Evil and Good: The Differential Effects of Page Protection on Wikipedia Article Quality. arXiv preprint arXiv:2310.12696.
- ↑ Ajmani, L., Vincent, N., & Chancellor, S. (2023). Peer Produced Friction: How Page Protection on Wikipedia Affects Editor Engagement and Concentration. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 1-33.
- ↑ Piccardi, T., Gerlach, M., Arora, A., & West, R. (2023). A large-scale characterization of how readers browse Wikipedia. ACM Transactions on the Web, 17(2), 1-22.
- ↑ Ducheneaut, Nicolas. "Socialization in an open source software community: A socio-technical analysis." Computer Supported Cooperative Work (CSCW) 14 (2005): 323-368.
- ↑ Johnson, I., & Lescak, E. (2022). Considerations for Multilingual Wikipedia Research. arXiv preprint arXiv:2204.02483.
- ↑ Morgan, Jonathan (2019). Patrolling on wikipedia. https://meta.wikimedia.org/wiki/Research:Patrolling_on_Wikipedia.
Notes
edit- ↑ Although Robert F. Kennedy Jr. has been nominated by Donald Trump to serve as US Secretary of Health and Human Services, he was listed as an independent (withdrawn) at Template:2024_United_States_presidential_election.
- ↑ For context, the percentage of reverted revisions in the training dataset for the multilingual revert risk model is 8%.
- ↑ This threshold is defined based on the work with the Wikipedia Knowledge Integrity Risk Observatory.