Research:A Quantitative Perspective on Collective Memory with Wikipedia Affordances

Contact
Prof. Brian C. Keegan
Duration:  2023-August – 2024-May
collective memory, quantitative methods, Arab Spring, cross-language
This page documents a completed research project.


Collective memory is the communal act of sense-making that results in community based perspectives of the past. It is difficult to study empirically across linguistic and cultural contexts due to diverse evidence and sense-making processes. Wikipedia has become a test case, a “global memory place”, to understand and compare the emergence and evolution of collective memory processes. As a test case, the Arab Spring is major demonstration of internet mediated collective action that spanned across linguistic barriers and has become a salient case study to understand collective memory on Wikipedia. There are no requirements that different Wikipedia language editions have identical or even similar content about a topic. Because the content of each edit made to Wikipedia articles are stored in revision histories that can be retrieved and analyzed, it is possible to retrieve and compare across historical versions of articles. Researchers can leverage this temporal variation in article content within and across languages to identify the emergence of consensus perspectives, conflicts, and other processes associated with the construction and contestation of collective memory processes. This paper examines the construction of collective memory on Arabic and English Wikipedia articles about the events related to the 2011 Arab Spring by focusing on the use of two types of links within articles. The first type of links are “outlinks” from one article to another (also known as “blue links”) that are strong signals of similarity between articles based on editors’ judgments of relevance and context.1 The second type of links are inter-lingual links (ILL) identifying similar topics in other language editions. This paper compares these links between articles across languages to detect what is introduced (or removed) and when as proxies for collective memory processes.


Methods

edit

This paper contributes to two related empirical literatures using Wikipedia data. The first contribution is the study of different Wikipedia language editions by comparing their structure, content, and collaboration practices like deliberation. This literature is often focused on macroscopic patterns involving the entire project and com-parisons at a particular point in time instead of more situated cases or the evolution of content. The second contribution uses Wikipedia data to understand collective memory processes. This literature is often focused on the collaborations and content in the days and weeks immediately following major events rather than collective memory processes that may unfold over years or decades. To address each of these limitations, we focus on a sample of articles related to the events of the 2011 Arab Spring that unfolded over 13 years ago to characterize and compare the “after-lives” of content about major historical events. This paper presents an ensemble of quantitative methods to develop a grounded understanding of four behaviors related to collective memory formation: salience, deliberation, contextualization, and consolidation.

The Arab Spring is what revolutionized our understanding of contemporary collective action. Our more complex definition of collective memory analyzing the Arab Spring summary page in English and Arabic, will allow a more thorough understanding of the entire collective memory process that is presented on Wikipedia. To analyze the salience, deliberation, contextualization, and consolidation of collective memory around the Arab Spring we identified four questions to motivate our work:

1. Is the concept salient through the time period in question, pointing to a continuing collective memory process? Through analysis of size and number of outlinks, we find that the Arab Spring is a salient across languages from 2011 until early 2024 (present).

2. Are results of varying deliberation processes across languages, creating divergent perspectives on the Wikipedia articles? By clustering outlinks by their temporal inclusion, we find there are varying deliberation processes that are highlighted through previously latent inclusion themes we label, ’Stable’, ’Debated’, and ’Forgotten’.

3. How similar are the outlinks contextualizing the event across languages over time? Leveraging the ILLs of outlinks, we see that not only are the outlinks available across languages is different but that the majority of outlinks that are included postphenomenon are isolated within each linguistic version of this article.

4. When do articles about related concepts link to the Arab Spring article, consolidating it within a broader context? Analysis of Ego-networks shows that in English articles about countries that are involved with the Arab Spring include reference to the Arab Spring but they do not in Arabic. Both languages debate the reflective reference to the phenomenon page when broken down into individual events within the Arab Spring.

Timeline

edit

Fall 2023 - Collect data

Winter 2023/2024 - Analyze data

Spring 2024 - Draw Conclusion and write up results

Overall Output: Peer-Reviewed Paper Published in either a conference or journal format

Status: Submitted Awaiting Review

Policy, Ethics and Human Subjects Research

edit

The Arabic and English Wikipedias’ coverage of the Arab Spring are structured by distinct collective memory practices. We developed an ensemble of generalizable quantitative methods grounded in definitions from memory studies to enable comparisons of collective memory processes across languages and over time. These methods could be extended to other cases to examine whether these processes on the English Wikipedia are reliably distinctive because of its size and influence or if other language editions with similar cultural proximity to these events (Farsi, Hebrew, Turkish, etc.) exhibit similarities to each other rather than to an outlier like English. Similarly, alternative constructs for collective memory processes and their empirical operationalizations might identify greater similarities or stronger differences.

These findings also contribute revisiting theories and methods for understanding collective memory processes. The concept of collective memory has been criticized from historiographical and cultural studies perspectives for reproducing dominant or revisionist historical narratives over alternative interpretations and marginalized perspectives. Wikipedia’s rules governing citations emphasize the use of reliable sources which may similarly privilege some framings and interpretations over others. Although major contemporary historical events like the Arab Spring should attract more representative cross-sections of editorial perspectives, Wikipedia is not immune from motivated editors “capturing” pages to enforce specific narratives. Future work should examine whether the “afterlives” of articles documenting major historical events are prone to capture or unreliable narratives because of declining editorial attention.

Our research design did not analyze the structure, dynamics, or content of discussions that accompanied the collaborations generating these articles. English and Arabic are each global languages with colonial legacies that do not map as cleanly to national boundaries as languages like Swedish, Hindi, or Japanese, for example. Collective memory cases drawn from those context may exhibit stronger self-focus biases than we saw with Arabic and English. Mechanisms like overlaps among contributors across articles and languages, conflicts over content, and processes for resolving disputes and forming consensus could have influenced the decisions to include or remove links over time we observed. A closer analysis of the content, discussions, and editors’ activity could provide a richer description of additional forces shaping the content including and beyond outlinks on these articles.

The clustering of outlinks based on similarities in their temporal inclusion ignores other constructs for measuring the similarity of links and articles over time. Word embeddings trained on the content of out-linked articles, coauthorship patterns of contributing editors, or similarities in the pageview activity could all provide additional context and counterfactuals for the patterns of the observed temporal clustering. Understanding how outlink temporal inclusion vectors align with these other measures of similarity could have implications for large language and other natural language models trained on Wikipedia data. Because different language editions encode different content, content differences could be mistakenly equated and the resulting biases amplified. Characterizing the (mis)alignments across languages and over time is an important for identifying and preventing misalignments.

Results

edit

How do collective memory processes unfold differently across different Wikipedia language editions? Previous research has looked at collective memory processes or compared content across language editions, but this paper provides novel empirical findings about both of these components. We proposed an ensemble of quantitative constructs for measuring four distinct collective memory processes identified in memory studies: salience, deliberation, contextualization, and consolidation. We evaluated these processes using a research design combining variation over time and across languages for articles related to the 2011 Arab Spring.

Salience was operationalized as a measure of the size and number of outlinks. The English article surprisingly shows greater variation in size and content over time than the Arabic article, following patterns similar to punctuated equilibria: periods of stability interrupted by sudden changes. In contrast, the Arabic article is both shorter and more stable. Deliberation was operationalized as similarity in the temporal vectors of outlink occurrence. The English article had five distinctive clusters of behavior relating to outlink inclusion that we classified as “stable”, “debated”, and “forgotten” interpretations while the Arabic article had only two clusters. Contextualization was operationalized as a measure of overlap in the inter-language link graph. Despite the availability of relevant topics across languages, the Arabic and English articles referenced very different concepts. Finally, consolidation was operationalized as references to the Arab Spring article in the articles about countries and their national-level protests. The Arabic Spring article is linked much less in Arabic articles for countries and events than in English.

These results highlighting disparities in the content and dynamics about the Arab Spring reinforce prior findings about the differences in coverage about the same topics across Wikipedia’s language editions. The greater salience, deliberation, contextualization, and consolidation on the English compared to the Arabic article for these major events is a surprising counterpoint to the “self-focus bias” found in other multilingual analyses. The English article employs links reflecting Western cultural biases about history and politics that do not appear in the the Arabic version that provides its own distinctive contexts. Despite the greater geographic and cultural proximity of the events of the Arab Spring to Arabic speakers, the English-language article has far greater dynamism in its content and relationships than the Arabic-language article. The comparative absence of links to the الربية العربي (“Arab Spring”) article across Arabic-language articles about national histories and even national events is particularly surprising

Resources

edit

Look at my talk on part our methods in the Wiki-histories Conference in May 2023 (https://www.youtube.com/watch?v=sdW9zzgcaHI&list=PLDShh5CA5xNn1MHepHXWSj9nHE2kmAdrj) and Wikiworkshop in July 2024 (https://www.youtube.com/watch?v=TrthZ-zJ8ow)

I presented a more complete view of our methods at the MENA and Asian Political Methodology in January 2024 and Visions in Methodology Conference in May 2024

References

edit

Adar, E.; Skinner, M.; and Weld, D. S. 2009. Information arbitrage across multi-lingual Wikipedia. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM ’09, 94–103. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-390-7.

Al-Shehari, K.; and Al-Sharafi, A. G. 2022. Negotiating Wikipedia narratives about the Yemeni crisis: Who are the alleged supporters of the Houthis? Media, War & Conflict, 15(2): 183–201. Publisher: SAGE Publications.

Confino, A. 1997. Collective Memory and Cultural History: Problems of Method. American Historical Review. Dandala, B.; Mihalcea, R.; and Bunescu, R. 2012. Towards building a multilingual semantic network: identifying interlingual links in Wikipedia. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval ’12, 30–37. USA: Association for Computational Linguistics.

Ferron, M.; and Massa, P. 2011a. Collective memory building in Wikipedia: the case of North African uprisings. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration, WikiSym ’11, 114–123. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-0909-7.

Ferron, M.; and Massa, P. 2011b. WikiRevolutions: Wikipedia as a Lens for Studying the Real-time Formation of Collective Memories of Revolutions. International Journal of Communication, 1313–1332.

Ferron, M.; and Massa, P. 2014. Beyond the Encyclopedia: Collective Memories in Wikipedia. Memory Studies, 7(1): 22–45.

Ford, H. 2022. Writing the Revolution: Wikipedia and the Survival of Facts in the Digital Age. MIT Press. Grabowski, J.; and Klein, S. 2023. Wikipedia’s Intentional Distortion of the History of the Holocaust. The Journal of Holocaust Research, 37(2): 133–190.

Halbwachs, M. 1992. On Collective Memory. Chicago: University of Chicago Press, 1st edition edition. ISBN 978-0-226-11596-2.

Hale, S. A. 2014. Multilinguals and Wikipedia editing. In Proceedings of the 2014 ACM conference on Web science, WebSci ’14, 99–108. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-2622-3.

He, S.; Lin, A. Y.; Adar, E.; and Hecht, B. 2018. The Tower of Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). Number: 1.

Hecht, B.; and Gergle, D. 2009. Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on Communities and technologies, C&T ’09, 11–20. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-713-4.

Hecht, B.; and Gergle, D. 2010. The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, 291–300. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-929-9.

Hickman, M. G.; Pasad, V.; Sanghavi, H. K.; Thebault-Spieker, J.; and Lee, S. W. 2021. Understanding Wikipedia Practices Through Hindi, Urdu, and English Takes on an Evolving Regional Conflict. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1): 34:1–34:31.

Kansteiner, W. 2002. Finding Meaning in Memory: A Methodological Critique of Collective Memory Studies. History and Theory, 41(2): 179–197.

Keegan, B. C. 2019. The Dynamics of Peer-Produced Political Information During the 2016 U.S. Presidential Campaign. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW): 33:1–33:20.

Kharazian, Z.; Starbird, K.; and Hill, B. M. 2023. Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias. arxiv:2311.03616.

Luyt, B. 2016. Wikipedia, collective memory, and the Vietnam war. Journal of the Association for Information Science and Technology, 67(8): 1956–1961.

Massa, P.; and Scrinzi, F. 2012. Manypedia: comparing language points of view of Wikipedia communities. In Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration, WikiSym ’12, 1–9. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-1605-7.

Pentzold, C. 2009. Fixing the floating gap: The online encyclopaedia Wikipedia as a global memory place. Memory Studies, 2(2): 255–272. Publisher: SAGE Publications.

Porter, E.; Krafft, P. M.; and Keegan, B. 2020. Visual Narratives and Collective Memory across Peer-Produced Accounts of Contested Sociopolitical Events. ACM Transactions on Social Computing, 3(1): 1–20.

Roy, D.; Bhatia, S.; and Jain, P. 2020. A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages. In Calzolari, N.; Be ́chet, F.; Blache, P.; Choukri, K.; Cieri, C.; Declerck, T.; Goggi, S.; Isahara, H.; Maegaard, B.; Mariani, J.; Mazo, H.; Moreno, A.; Odijk, J.; and Piperidis, S., eds., Proceedings of the Twelfth Language Resources and Evaluation Conference, 2373–2380. Marseille, France: European Language Resources Association. ISBN 979-10-95546-34-4.

Roy, D.; Bhatia, S.; and Jain, P. 2022. Information asymmetry in Wikipedia across different languages: A statistical analysis. Journal of the Association for Information Science and Technology, 73(3): 347–361.

Twyman, M.; Keegan, B. C.; and Shaw, A. 2017. Black Lives Matter in Wikipedia: Collective Memory and Collaboration around Online Social Movements. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW ’17, 1400–1412. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-4335-0.

Yasseri, T.; Gildersleve, P.; and David, L. 2022. Chapter 9 - Collective memory in the digital age. In O’Mara, S. M., ed., Progress in Brain Research, volume 274 of Collective Memory, 203–226.

Elsevier. Zubrzycki, G.; and Woz ́ny, A. 2020. The Comparative Politics of Collective Memory. Annual Review of Sociology, 46(1): 175–194.