Grants:Project/nschwitter/The Role of Offline Ties of Wikipedians/Final
This project is funded by a Project Grant
proposal | people | timeline & progress | finances | midpoint report | final report |
- Report accepted
- To read the approved grant submission describing the plan for this project, please visit Grants:Project/nschwitter/The Role of Offline Ties of Wikipedians.
- You may still review or add to the discussion about this report on its talk page.
- You are welcome to email projectgrantswikimedia.org at any time if you have questions or concerns about this report.
Welcome to this project's final report! This report shares the outcomes, impact and learnings from the grantee's project.
Part 1: The Project
editSummary
editTo what extent is online behaviour in a large online community affected by offline meetings between the members of the community? This question was guiding my PhD thesis and this project more specifically. To answer these questions, I collected the complete archive of all offline meetups which will be made available as part of my publications for other researchers to use.
I found small but positive effects regarding contribution behaviour: compared to a control group of similar others, meetup attendees are slightly more active after a meetup. This is important to know as previous research regarding other online communities has often found that people withdraw from the online community and their online contacts after having attended offline gatherings. While I found that people collaborate slightly more with others after meeting them face-to-face, it does not seem to be the case that exclusionary cliques develop.
I also looked at effects of offline meetup participation on online elections and found that they matter: Meetup-goers are more involved in Wikipedia online elections, and even the direction of voting is influenced by the offline sphere: those who meet more pro-voters also tend to vote pro; and those who meet more contra-voters tend to vote contra. While I cannot explain these relationships uncovered (e.g. are users discussing upcoming or current elections at the meetups they attend and potentially come to a consensus, or are users voting like their friends or even feel pressured to vote in line with them?), my results highlight that it is important to consider offline meetups when thinking about public elections on Wikipedia.
My project has uncovered important dynamics between offline and online behaviour; future research should make use of the data I collected to answer questions which I left unanswered due to time, money and other constraints. While I focused on the German Wikipedia, future efforts should strive to scale similar analyses to other language versions. To make this easier, I have written up my lessons-learnt as part of learning patterns.
Project Goals
editThe following lists my project goals from my proposal page and discusses how I have met them:
- Identify and characterise all past offline meetups of the German Wikipedia community
I have collected all offline meetings organised in the German Wikipedia. I have characterised them in terms of whether they were social-/work-oriented, understood and (anecdotally) described their dynamics, and plotted their evolvement over time and space. I have also described the population of meetup goers and the network which has developed between the attendees. I have produced a dataset containing all offline meetings to share (as part of my publications).
- Understand the role of offline meetups for the Wikipedia online community
Offline meetings are an important part of the community - this is obvious from various discussions on Wikipedia itself as well as from the (scientific) literature review. My additional findings are discussed below.
- Understand their effect in regard to contributing behaviour
I merged the offline meetup data with online contribution behaviour from the data dumps. I matched meetup attendees with comparable non-attendees to assess a (causal) effect of meetup participation on contribution behaviour using a difference-in-differences design (quasi-experimental setup). It needs to be noted that the possibilities of identifying causal effects are only very limited with observational digital trace data (as it is not possible to randomize people into attending meetings). I find that attending an offline meetup has a positive effect on the contribution behaviour of users. Users who have not made any edits in the time frame before the meetup are more likely to start editing after taking part in an offline get-together. Further, while it is not necessarily the case that users increase their contributions after a meetup in comparison to before the meetup, their reduction in contributions is less than the reduction a comparable control group experiences. Concerning collaboration, I find that attendees become slightly more likely to collaborate with one another, but there is no evidence of shifting the extent of the collaboration to the users that have attended a meetup with a user in favour of those that have not been met.
- Understand their effect in regard to reverting behaviour
I merged the offline meetup data with online contribution behaviour from the data dumps. Reverts on Wikipedia can be understood as a sort of norm-enforcing behaviour (following previous lines of research). I conceptually replicated and extended the study of Piskorski and Gorbatai (2017) who tested to what extent the density of a user's online collaboration network is of relevancy in regard to norms [1]. Overall, I found only very limited support for the argument brought forward by James Samuel Coleman[2] when focusing on the online network measures and only limited importance of the offline network. Users who attend meetups tend to both experience and conduct fewer norm violations, and they give and receive more rewards. However, the density of the offline network does not play a noteworthy role in explaining online norm violation and norm enforcement, except that those in high-density offline networks generally give, unexpectedly, fewer rewards. There is thus no support for Coleman's mechanism based on the offline network, but the results do suggest that those taking part in meetups behave somewhat differently online than those who do not meet up.
I focused more strongly on this aspect than on the target of reverts (as initially mentioned in the research questions). This could still be investigated going forward.
- Understand their effect in regard to election participation
I merged the offline meetup data with newly collected data on request for adminship. Using within-between linear probability models to account for the nested/longitudinal data structure, I compare the offline network features of meeting attendees on a between- and within-level (i.e. compare it with other users and compare it within the same user but at a different point in time). I found that offline participation measures only weakly influence whether a user runs for administrator in a given year. To a greater extent, however, the offline network affects whether one is successful as a candidate, whether one votes and whether one votes supportively: the larger the proportion of voters a candidate has met, the more likely they are to win and the higher the proportion of other voters a user has met, the more likely they are to vote themselves (this also holds true for the direction of votes: the more pro-voters a user knows, the more likely they also vote supportively, and the more anti-voters they have met, the less likely they vote supportively). Users are also more likely to vote if they have met the candidate, and they tend to support those more central in the meetup network.
- Publish the results in Wikimedia community discussion forums to allow for reflection upon the necessity to create more inclusive offline meetups
I took part in the WikiWorkshop 2022, posted a report in der Kurier, and presented at the Digitaler Themenstammtisch in December. However, this is an ongoing task and goal (see also next steps); I will repeat the presentation at the Digitaler Themenstammtisch in January again.
- Create documents/guidelines documenting the data collection and analysis to allow others to apply and reproduce the analyses conducted in other contexts (e.g. other language versions of Wikipedia)
I documented my process of data collection and data analysis throughout. I wrote them up as learning patterns as they fit my intention ideally. They explain my strategy to repeat them in other contexts and they can be adapted by others and myself in the future. I also use them to share code-snippets of my approach.
Project Impact
editImportant: The Wikimedia Foundation is no longer collecting Global Metrics for Project Grants. We are currently updating our pages to remove legacy references, but please ignore any that you encounter until we finish.
Targets
edit- In the first column of the table below, please copy and paste the measures you selected to help you evaluate your project's success (see the Project Impact section of your proposal). Please use one row for each measure. If you set a numeric target for the measure, please include the number.
- In the second column, describe your project's actual results. If you set a numeric target for the measure, please report numerically in this column. Otherwise, write a brief sentence summarizing your output or outcome for this measure.
- In the third column, you have the option to provide further explanation as needed. You may also add additional explanation below this table.
Planned measure of success (include numeric target, if applicable) |
Actual result | Explanation |
Conclude whether offline meetups are a relevant factor for contributing towards Wikipedia. | They are! | More detailed results on this can be found in the other sections. |
Publish results in Wikimedia community forums such as Der Kurier. | Article, Digitaler Themenstammtisch | Article in the community forum and presentation at the digital topic discussion round (in December and upcoming talk in January). |
PhD thesis which will be submitted in 2022. | I submitted the thesis in September 2022 and defended in December 2022. | Link to the thesis / Link to the thesis via Webcat |
Presentation at a WikiWorkshop. | Paper using German meetup data | |
Publish dataset on meetups | Dataset available on OSF | All details in preprint |
Publish dataset on elections | Dataset available on OSF | |
Write up and document the data collection and the analyses that were conducted in the fashion of hands-on tutorial. | learning pattern on collecting data on offline meetups, learning pattern on collecting data on requests for adminship, learning pattern on analysing effects of offline meetups | I have pusblished them in the form of learning patterns. |
Story
editLooking back over your whole project, what did you achieve? Tell us the story of your achievements, your results, your outcomes. Focus on inspiring moments, tough challenges, interesting anecdotes or anything that highlights the outcomes of your project. Imagine that you are sharing with a friend about the achievements that matter most to you in your project.
- This should not be a list of what you did. You will be asked to provide that later in the Methods and Activities section.
- Consider your original goals as you write your project's story, but don't let them limit you. Your project may have important outcomes you weren't expecting. Please focus on the impact that you believe matters most.
To what extent is online behaviour in a large online community affected by offline meetings between the members of the community? This question was guiding my PhD thesis and this project more specifically. Using Wikipedia as a case study - one of the largest and most successful and sustainable examples of online peer production - I embedded it into a sociological context, and focused on one of the (to the general reader) most unknown facts of the Wikipedia community: Wikipedians meet offline, often, regularly and across the globe. In the typical spirit of Wikipedia, these meetings are organised publicly and are well-documented with lists of attendees, minutes, and photo evidence.
These offline meetings are important to the community of Wikimedia but rather neglected by the scientific community. Wikipedia as a whole is often used by computer scientists, but the goldmines of data it provides are relatively untouched by (computational) social scientists. As part of this project, I collected the complete archive of all offline meetups. As part of my publications, this dataset will also be made openly available. This will hopefully not only allow other researchers to use the data but also motivate others to continually keep the list updated - the impacts of the Covid-19 pandemic on the offline meetup scene of Wikipedians would be interesting to explore. While this goes beyond my research study, I focused on how offline meetup participation affects three different domains of online behaviour: 1) productivity, 2) norm-related behaviour, and 3) voting participation. I found small but positive effects regarding contribution behaviour: compared to a control group of similar others, meetup attendees are slightly more active after a meetup. This is important to know as previous research regarding other online communities has often found that people withdraw from the online community and their online contacts after having attended offline gatherings. While I found that people collaborate slightly more with others after meeting them face-to-face, it does not seem to be the case that exclusionary cliques develop. Regarding the second domain, I was challenged with how to best define and operationalise important social scientific concepts like norm violations, norm enforcement and rewards. While I initially wanted to replicate a previous study using the English Wikipedia (Piskorski and Gorbatai 2017) closely, I diverged from this as it has proven difficult to use the same definitions in the German Wikipedia. This highlighted how important it is to know the context well. Thirdly, I looked at effects of offline meetup participation on online elections and found that they matter: Meetup-goers are more involved in Wikipedia online elections, and even the direction of voting is influenced by the offline sphere: those who meet more pro-voters also tend to vote pro; and those who meet more contra-voters tend to vote contra. While I cannot explain these relationships uncovered (e.g. are users discussing upcoming or current elections at the meetups they attend and potentially come to a consensus, or are users voting like their friends or even feel pressured to vote in line with them?), my results highlight that it is important to consider offline meetups when thinking about public elections on Wikipedia.
In many ways, this research project felt like a starting point for future research. I have collected a large dataset which can and should be used to better understand the effects of offline meetups on an online community. My research has been valued by Wikipedians and Wikimedians, and the data offered by Wikimedia is a goldmine for social science researchers. Given time and scope, I had to restrict my project, but I believe there are many important avenues for future research.
Survey(s)
editIf you used surveys to evaluate the success of your project, please provide a link(s) in this section, then briefly summarize your survey results in your own words. Include three interesting outputs or outcomes that the survey revealed.
No surveys used.
Other
editIs there another way you would prefer to communicate the actual results of your project, as you understand them? You can do that here!
I have analysed the effects of offline meetings on three different domains of online behaviour. I will outline this in the following and present descriptive and inferential results.
Analysis of meetups
editQualitative results:
- Meetings are generally friendly places of community: These meetups bring together editors of Wikipedia, giving the anonymous usernames a face.
- Regional-based: Most meetups (the most common form is the informal Stammtisch) are organised locally, and the purpose is mostly to socialise and get to know each other.
- Project-based: Project-oriented meetings tend to include users from different geographical areas sharing a topical interests.
- Supra-national: Every once in a while, some more planning extensive meetups take place which are supra-regional in nature. These can attract dozens of attendees from different parts of the country/area.
- A community develops at such meetings and there is ample evidence for this (e.g. attending a funeral together after the passing of previous attendees).
- There also exists conflict at meetings:
- In some cases, users were disappointed by the social and informal nature of these meetings; such as being disappointed with the lack of structure and introductions of participants. While such negative feelings are acknowledged by some, others also highlight that it requires effort by newcomers to join established meetup cliques. Even if meetings are not appreciated by all, the regulars seem to become rather defensive about their meetup culture.
- In some cases, users have not attended meetings depending on the other users attending. For example, users have mentioned that attendees seem to be a rather selective group of people, in particular made up of administrators. Users have reasoned not to come as they expected those meetups to be "meetings for administrators and insiders instead of for real authors".
- Users also tend to be rather hesitant to come when journalists are present. Journalists often tend to be seen as external intruders.
- There are also instances of considerable conflict which tend to include the Wikimedia Foundation.
- Specifically, lines of conflict can occur in cities with community spaces: Community spaces exist in a number of German cities, are generally supported by the Wikimedia Foundation and offer a headquarter for both staff members of the Foundation as well as engaged Wikipedians. Community spaces often grew out of an active meeting community in a city but, once established, can lead to conflicts. In most cities, they co-exist peacefully (however, the establishment of a community space often leads to a reduction in the frequency of general meetings), but there are cases of disagreement about how things should be organised.
- The handling of blocked users is a point of discussion: In some instances, there were explicit anti-invitations of some users (being de-invited after also having been blocked). Some users agreed with these practices, while other spoke against them.
- Such conflicts can lead to a split of the meetup community with alternative meetups organised.
- (Perceived) Inequality of meeting access: While Wikipedia meetups are generally open to all, a certain reluctance to join them is observable on the organisational pages of multiple regional portals, and skewed distributions of attendee demographics are also sometimes directly discussed (particularly a skewed gender distribution: Female quote of 0%-20% are to be the norm). In many cases, editors that are or consider themselves to be in a minority on Wikipedia (e.g. newcomers, young editors, women) are hesitant to join local meetups.
Quantitative, descriptive results:
- 4408 meetings were recorded in the German Wikipedia (excluding very large and extremely regular meetups organised through community spaces; 10 very large meetups without an inherently social component were excluded as I generally assume that people actually meet each other at meetings. The extremely regular meetups organised through community spaces exhibited a different dynamic from other meetings, often only consisting of a very small core group of users attending, and were also difficult to collect as people stopped signing up).
- The first meeting took place on October 28, 2003 with 5 attendees in Munich. The last collected ones (before the outbreak of the Coronavirus pandemic) took place on March 13, 2020, in Cologne and Leipzig.
- Around 3/4 of meetings collected can be classified as primarily social, i.e. not having mainly the intention to work on Wikipedia. The proportion of work meetings has increased over the years.
- 89% of the meetings have taken place in Germany, 6% in Austria, 4% in Switzerland. The remaining per cent of meetings has taken place in locations all across the globe.
- The average number of attendees per meetup is 8.4 (mean; median of 7) with a minimum of 1 (meaning there were meetups where users were alone) and a maximum of 119 (I excluded very large meetups without a social character because I assume in my analyses that attendees of a meeting have actually met).
- The average number of meetups of a Wikipedian who is in the meetup network at all (i.e. went to at least one meeting) is 9.2 (mean; median of 2) with a minimum of 1 and a maximum of 289 meetups.
- In the user network (network connecting users with other users who have attended the same meetup) there are 4013 nodes sharing 102738 edges (density of 0.013). This means, 4013 users have taken part in meetings and created 102'738 relationships with each other.
- The mean of the number of times users have met is 2.3 (median of 1), with a minimum of 1 and a maximum of 153.
- The degree of Wikipedians relates to the number of other users they have met through meetups. The average degree in the user network is 51.2 (mean; median is 22) with a minimum of 1 and a maximum of 1141.
- The diameter (longest path) between users in the network is 8 (average 2.72); this reflects a "small world".
- On average, users who go to meetings were active on Wikipedia for 921.2 days before their first meeting (days since first edit; range -3824.1 to 5968.2). This means some users were active in the offline component before contributing to the German Wikipedia.
Analysis of contribution behaviour
editWhen analysing the productivity of Wikipedians, I was interested in a potential (causal) effect of meetup participation on the extent of contribution and collaboration on Wikipedia.
- I found that there is a change in the level of activity observable around meetup dates, varying to a large degree between users.
- On average, meeting-goers make 39.4 edits in the week before the meeting (minimum 0, maximum 7868) and 37.6 in the week after the meeting (minimum 0, maximum 11,822) in the Wikipedia mainspace.
- On average, meeting-goers make 1992.6 edits in the year before the meeting (minimum 0, maximum 633'557) and 1819.8 in the year after the meeting (minimum 0, maximum 329'932) in the Wikipedia mainspace.
- For the analyses, I took the cube root of the change in the after-before activity as depicted in the stacked barplot (which looks at different time frames and differentiates between mainspace and total activity across all namespaces).
- For the analysis, a control group of similar other users was constructed so that for each meetup date, a difference-in-differences could be calculated, assessing the effect of the meetup in both the short and long term.
- Results show that attending an offline meetup has a positive effect on the contribution behaviour of users. It is not necessarily the case that users increase their contributions after a meetup in comparison to before the meetup, their reduction in contribution is less than the reduction a comparable control group experiences (more detailed results are also in the WikiWorkshop paper).
- In the short term, I find:
- Users who have not edited in the week before a meeting, edit with a predicted probability of 16% the week after the meeting.
- Users who have not edited in the week before a meeting and attended the meeting, edit with a predicted probability of 36% the week after the meeting (20 pp more than the control group).
- Users who have edited in the week before a meeting make on average 0.03 edits less in the week after the meeting
- Users who edited in the week before a meeting and attended the meeting make on average 0.01 edits less in the week after the meeting (0.02 edits more than the control group).
- When including more variables than just comparing meetup attendees with non-attendees, I also find that users are more likely to edit and edit more if they have been to a work-related meeting (compared to a social meeting); that after the first meeting, users edit more in all namespaces (not mainspace); and that administrators are generally more active, but have a smaller positive effect from meeting participation.
- In the long term, I find:
- Users who have not edited in the year before a meeting, edit with a predicted probability of 6% the year after the meeting.
- Users who have not edited in the year before a meeting and attended the meeting, edit with a predicted probability of 31% the year after the meeting (25 pp more than the control group).
- Users who have edited in the year before a meeting make on average 12.5 edits less in the year after the meeting.
- Users who edited in the year before a meeting and attended the meeting make on average 3.8 edits less in the year after the meeting (8.7 edits more than the control group).
- Results further show that attendees become more likely to collaborate with each other after a meeting and there is no evidence of shifting the extent of the collaboration to the users that have attended a meetup with a user in favour of those that have not been met.
- All effects, especially regarding collaboration behaviour, are rather small.
Analysis of norm-relevant behaviour
editI conceptually replicated and extended the study of Piskorski and Gorbatai (2017) who tested to what extent the density of a user's online collaboration network is of relevance in regard to norms [3]. This built on the argument put forward by James Samuel Coleman[4]. I tested to what extent the density of a user's offline network is relevant in explaining their norm-relevant behaviour.
- Describing the reverting behaviour of Wikipedians in the German Wikipedia shows a strong increase in the usage of the feature in the first few years of Wikipedia. In 2001, the first year of the German Wikipedia, four edits were reverted. The number of reverts increased in the early years of Wikipedia up until around 2007 when it stabilised until it started to decrease starting in 2011. Since around 2014, it has been on a stable level with around 400'000 reverts each year is observable.
- Most years, the majority of reverts are done by registered users who revert IPs. Similar to the development of the total number of reverts, the number of such User > IP-reverts increased in the early years but has been on a decreasing trend in the recent past. Instances in which a user reverts another user also occur relatively often. These cases had been increasing in the early years and remain on a relatively stable level since 2007 with around 200'000 reverts per year. Instances in which IPs revert other IPs or users are comparatively rare, particularly the former.
- Focusing on the proportion of edits reverted by contributor type, reveals for example that, on average, 17% of all edits made by IPs are subsequently reverted. Across the years, less than 1% of edits by bots are reverted and around 2.5% of edits made by registered users (with an increase in the early years of Wikipedia and stable numbers since 2008). There is an increase in the proportion of edits reverted made by IPs up to the year 2010 (peak year where around 24% of edits made by IPs were reverted) and since decreased slightly again.
- In my analysis, I make use of the thanking feature as a measure of rewards. The feature has been introduced in 2013, and up to 2020, 754'526 instances of one user thanking another have been recorded in the logbook. After being introduced at the end of 2013, the feature has been increasingly used over the years with well over 100'000 thanks given per year.
- Across all years, 29'164 unique users have thanked others and 40'035 users have received thanks. It is thus not a feature which is used by the majority of Wikipedians. One user has thanked others 17'785 times, while others have only used the feature once.
- On average, a user who has used this feature at least once thanked others 25.67 times (median 2, standard deviation 194.86). 17.7 per cent of thanks were given by those that were, are, or will be administrators, and on average, users that use the feature have been active for 7.39 years (median 7.90, standard deviation 4.48, minimum 0, maximum 18.39).
- Regarding receiving thanks, users who have received at least one thank you, have on average received 18.70 thanks (median 1, standard deviation 170.24, minimum 1, maximum 24'103). 12.0 per cent of thanks were given to those who were, are, or will be administrators, and, on average, users that received thanks have made their first edit 8.10 years ago (median 8.77, standard deviation 4.29, minimum 0.00007, maximum 18.50).
- In some instances, the same pair of users have thanked each other multiple times. While, on average, one user thanked another one 2.00 times (median 1), the maximum is a high of 663 times (standard deviation 5.42).
- In my conceptual replication of the study of Piskorski and Gorbatai (2017), I found only partial support of the argument of James Coleman and only limited importance of the offline network: Actors embedded in dense (online) networks are less frequently the victim of norm violations or violate norms themselves (supportive of theoretical argument). I find negative effects of an actor's online network density on their likelihood to punish norm violators on behalf of others and to experience such punishments of others themselves (no support). Regarding rewards: When focusing on all users, I find that users who have previously conducted norm punishments receive more rewards but there is no evidence of a positive effect of network density on giving rewards.
- Those attending meetups at all tend to experience both fewer norm violations and norm punishments, and they give and receive more rewards. However, the density of the offline network does not play a noteworthy role in explaining online norm violation and norm enforcement.
- Generally, the results are rather mixed and difficult to explain. I hope to make more sense of this in the future and thus focused more on the descriptives in this report.
Analysis of voting behaviour
editWhen analysing voting behaviour of Wikipedians, I was interested to explain who is being nominated and elected to become administrator, as well as who is voting in these elections. The focus lay in understanding what role offline meetups play in this context.
- I collected all 1213 elections organised on the German language Wikipedia until the end of March 2020 (including re-elections; I excluded elections without eligible candidates in the analyses).
- The first election recorded took place on April 9, 2003 and did not have any recorded voters, and the last one ended on March 16, 2020 after 257 users voted. Both elections led to a new administrator.
- In total, 60 per cent of elections were successful and the candidate became a new or re-elected administrator.
- The number of elections peaked in the early years of the German Wikipedia and decreased across the years.
- The proportion of elections with a successful outcome remains relatively similar across the years.
- The number of voters per election varies from 0 (in the early days of Wikipedia) to 533 with a mean of 168.35 (median 165, standard deviation 110.91).
- The number of votes has increased steadily in the first years of the German Wikipedia and has remained stable, attracting around 200-300 voters per election. In 2003, when there was no real election procedure in place, no votes were being cast and counted.
- Across all elections, the number of supporting votes is, with a mean of 113.16 (median 99, standard deviation 88.92, minimum 0, maximum 400), much higher than those of the opposing votes with a mean of 40.54 (median 24, standard deviation 43.35, minimum 0, maximum 257).
Generally, there is a notable overlap between people that vote and people that take part in elections (focusing on the year before the meetup):
- Eligible voters: In 7% of cases, they attended a meetup.
- Voters: In 36% of cases, they attended a meetup.
- In 3% of cases, they met the candidate personally.
- In 36% of cases, they met another voter personally.
- Candidates: In 37% of cases, they attended a meetup.
- In 40% of cases, users who know the candidate personally voted.
- In 90% of cases, voters who know the candidate personally voted in favour of them.
- In 7% of cases, voters who know the candidate personally voted against them.
- Successful candidates: In 45% of cases, they attended a meetup.
In the following, I will present results regarding the four main questions I investigated. I keep the statistical details short, but I am happy to explain more if needed (e.g. regarding subsampling, modelling approach, control variables etc.; the details can also be read in the thesis).
- To what extent does participation in offline meetings influence the decision of editors to stand for election?
- Approach: Compare those running to become administrator with those not running (as well as to analyse within-user changes over time).
- 9014 observations of 3973 users (247 instances of candidacy); subsampled dataset
- Analysis on the level of years
- Dependent variables: Ran as administrator in a given year (0/1)
- Estimation of multilevel within-between linear probability models
- Exclusion of re-elections
- Bivariate results (relationship between two variables):
- The more meetings a user has attended in the last year, the more likely that person is to stand as an administrator.
- The more voters a user has met in the last year, the more likely that person is to stand as an administrator.
- The more central a user is in the meetup network, the more likely that person is to stand as an administrator.
- Multivariate results (controlling for additional variables):
- There are no significant effects of offline interactions, except the within variation of the number of other users met: The more users a user has met in the past year, the more likely the user is to run for administrator.
- However, the effect is small: it is more important to collaborate and talk to other users).
- Bivariate effects found regarding the other variables disappear when controlling for other variables.
- To what extent is the result of administrator elections influenced by participation in offline meetings?
- Approach: Compare successful candidacies with unsuccessful ones.
- 1191 observations of 756 users (718 became administrator)
- Dependent variables: Became administrator (0/1)
- Estimation of linear probability models with clustered standard errors
- Bivariate results (relationship between two variables):
- The more meetings a user has attended in the last year, the more likely that candidate is to be successful in an election.
- The more voters a user has met in the last year, the more likely that candidate is to be successful in an election.
- The more central a user is in the meetup network, the more likely that candidate is to be successful in an election.
- Multivariate results (controlling for additional variables):
- Positive and significant effect of the proportion of voters met: Having met 1 per cent more of the voters leads to a 2.7 per cent increase in the probability to win the election.
- Both the effects of the bare number of meetups attended and the eigenvector centrality of a candidate are positive and significant, unless the number of voters met is included in the model simultaneously.
- The control variables suggest that more active users and those who have been registered longer are more likely to win in elections.
- To what extent does participation in offline meetings influence the decision of editors to vote in an administrator election?
- Approach: Compare those voting at the election with those not voting (as well as to analyse within-user changes over time).
- 996'668 observations of 13'979 users (126'615 votes); subsampled dataset
- Dependent variables: Voted (0/1)
- Estimation of multilevel within-between linear probability models
- Bivariate results (relationship between two variables):
- If a user has met a candidate in the last year, they are more likely to vote in an election.
- The more meetings a user has attended in the last year, the more likely that user is to vote in an election.
- The larger the proportion of voters a user has met in the last year, the more likely that user is to vote in an election.
- The more central a user is in the meetup network, the more likely that user is to vote in an election.
- Multivariate results (controlling for additional variables):
- I find significant and positive effects of having met the candidate: users who have met candidates generally and also specifically the candidate of one election are more likely to vote.
- There is no significant effect of the number of meetings attended.
- Having met a larger proportion of voters in an election significantly increases a user's predicted probability to vote.
- The control variables further suggest that users who have made more edits are more likely to vote, and that users are more likely to vote if the candidate at the election has edited more than them.
- Those users that have been reverted by the candidate or reverted the candidate themselves are more likely to vote.
- Both having collaborated and talked with each other increases the probability to vote as well.
- To what extent does participation in offline meetings influence the decision of editors to vote in support of an administrator election?
- Approach: Compare those voting supportively with those voting opposingly (as well as to analyse within-user changes over time); exclusion of neutral votes
- 115'608 observations of 2939 users (87'519 pro votes); subsampled dataset
- Dependent variables: Voted supportively (0/1)
- Estimation of multilevel within-between linear probability models
- Bivariate results (relationship between two variables):
- If a user has met a candidate in the last year, they are more likely to vote supportively in an election.
- If a user generally attends meetings (i.e. the more they have attended on average), the more likely that user is to vote supportively in an election.
- The larger the proportion of other supporting voters a user has met in the last year, the more likely that user is to vote supportively in an election.
- The smaller the proportion of anti-voters a user has met in the last year, the more likely that user is to vote supportively in an election.
- The more central a user is in the meetup network on average, the more likely that user is to vote supportively in an election.
- Multivariate results (controlling for additional variables):
- I find a significant between effect of attending meetings, suggesting users that have, on average, attended more meetings, are generally more likely to vote supportively.
- I find significant and positive effects of the proportion of supporting voters met and negative and significant effects of the proportion of opposing voters met.
- The control variables further suggest that the editing behaviour of the voter does only play a small role; there is only a significant effect of the recent mainspace activity.
- However, there is a significant and positive effect of the difference in the number of total edits with users being more likely to vote supportively if the candidate has more edits than they themselves.
- Users registered for longer are more likely to vote positively.
- User that have been reverted by the candidate or themselves reverted the candidate are less likely to support them.
- Having collaborated and talked with each other increases the probability to vote supportively.
- Having collaborated with or talked to other voters generally exhibits the same effects as having met voters; there is a positive effect of sharing ties with pro-voters and a negative effect of sharing ties with anti-voters.
- Overall, I found that offline participation measures only weakly influence whether a user runs for administrator in a given year.
- To a greater extent, the offline network affects whether one is successful as a candidate, whether one votes, and whether one votes supportively: the larger the proportion of voters a candidate has met, the more likely they are to win and the higher the proportion of other voters a user has met, the more likely they are to vote themselves (this also holds true for the direction of votes: the more pro-voters a user knows, the more likely they also vote supportively, and the more anti-voters they have met, the less likely they vote supportively).
- Users are also more likely to vote if they have met the candidate, and they tend to support those more central in the meetup network.
- This highlights that taking part in elections, either passively as a voter or actively as a candidate, is influenced, among many other things, by meetup participation.
Methods and activities
editPlease provide a list of the main methods and activities through which you completed your project.
The past 12 months of the project continued my work from the previous two years. On one hand, I conducted research work on my own to bring the academic research (and the research thesis) forward, and on the other hand, I have started (and am continuing) sharing the results across multiple different avenues. I worked on three topical domains (1) productivity and collaboration, 2) norm relevant behaviour/reverting, 3) elections) and have answered the research questions laid out in all of these domains. In the following, I will describe how I analysed and reported on my findings so far. I will give some mention of the statistical methods used but I will keep this brief - if anyone is interested in any part in particular, please raise it on the discussion page and I am more than happy to give more detail.
Generally, regarding meetup data, I completed the following tasks:
- Collected, pre-processed and cleaned data (started task before project grant).
- I published a learning pattern on how to collect this sort of data.
- Conducted descriptive analysis.
- I constructed basic variables to better understand meetups on Wikipedia and get key numbers: Where and when do they take place, how many users take part in these meetings, what does the network that develops between meetup goers look like?
- I summarised information about the meetups on Wikipedia to highlight problems and dynamics and to generally paint a richer understanding of meetups on Wikipedia.
- I prepared the data to make it openly shareable via the OpenScienceFramework.
- Started writing a data brief manuscript to introduce the data and highlight its sociological potential. This should make the meetup data more accessible for computational social scientists.
Regarding productivity and collaboration:
- I pre-processed and cleaned data.
- Cleaning in particular included substantial wrangling with user names, their encodings and name changes.
- I wrote a learning pattern on how to deal with the data dumps.
- Conducted descriptive analysis.
- Conducted inferential analysis of productivity and collaboration.
- Written up results on productivity and collaboration.
- My main interests were finding out whether partaking in meetings had a positive effect on how much users contribute in the future, and whether it affected whom one collaborated with.
- To assess a (ideally causal) effect of meetups, I constructed a comparable control group of users with a similar pre-meetup activity level and pattern, as well as a similar registration date, so that I could compare meetup attendees with this matched control group.
- I follow a difference-in-differences approach to analyse the data.
- I use linear probability models and linear regressions with robust standard errors to assess the effects of meetup attendance.
- I conducted several robustness checks using generalised linear models and a lagged dependent variable approach.
- I presented the results on productivity at the WikiWorkshop 2022.
- I have started to outline a potential journal publication.
Regarding reverting behaviour:
- I pre-processed and cleaned data.
- Conducted descriptive analysis to better understand who reverts whom etc.
- Conducted inferential analysis of reverting behaviour.
- Written up results on reverting behaviour.
- I tried to replicate a previous study conducted by Piskorski and Gorbatai which was published in the American Sociological Review in 2017 [5].
- In line with this previous research, I concentrate on one year of activity on Wikipedia and check to what extent the number of norms violated, norm punishments conducted and rewards given and received depend on a user's network (in my case, their online and offline network).
- I made use of the features of Wikipedia which allow users to undo changes which do not conform to the norms and rules of the website, and which allow users to thank others for edits made. I diverged from the variable operationalisations of Piskorski and Gorbatai (2017).
- I use multilevel negative binomial models to analyse the data (which have a relatively good fit).
- I conducted several robustness checks using different definitions of norm-relevant edits or network ties, etc.
- Presented results on reverting behaviour at a German conference of network researchers.
Regarding election behaviour:
- Collected, pre-processed and cleaned data.
- I published a learning pattern on how to collect data from requests for adminship.
- Conducted descriptive analysis to better understand the election process on Wikipedia and how it changed over time.
- Conducted inferential analysis of voting behaviour.
- I focused on four different explananda: 1) Who is running to become administrator, 2) who is winning in elections, 3) who is voting in elections, and 4) who is voting supportively in elections.
- Theoretically, I embed voting on elections into classical voting theories and draw parallels to public assembly voting.
- I use multilevel within-between linear probability models to analyse the data.
- I conducted several robustness checks using fixed-effects models and generalised linear models.
- Started writing a manuscript to submit to a journal.
On general outreach:
- I presented part of the project at several conferences: I discussed it at a PhD conference of the University of Warwick, at the WikiWorkshop 2022, at the Sunbelt social network conference, and a meeting of German social network scientists. I have also been invited to and discussed it in a research seminar series at the University of Mainz.
- I guest lectured about Wikipedia and my own research as part of a module on digitalisation at the University of Bern twice. This module is part of a certificate of advanced studies aimed at professionals interested in the field of sustainable development.
- I reported on the project in der Kurier, the German Wikipedia's community forum, and have presented at the Digitaler Themenstammtisch (I will repeat the latter next January).
- I published several guidelines on data collection and analysis for reproducibility and scalability. I used the format of learning patterns.
- I developed a publication strategy and started to write manuscripts for academic journals.
Project resources
editPlease provide links to all public, online documents and other artifacts that you created during the course of this project. Even if you have linked to them elsewhere in this report, this section serves as a centralized archive for everything you created during your project. Examples include: meeting notes, participant lists, photos or graphics uploaded to Wikimedia Commons, template messages sent to participants, wiki pages, social media (Facebook groups, Twitter accounts), datasets, surveys, questionnaires, code repositories... If possible, include a brief summary with each link.
- PhD thesis / permanent link via Webcat
- Dataset on offline meetups available on OSF
- Details on the dataset, preprint on SocArXiv; published version in the Journal of Computational Social Science (OA)
- Dataset on elections available on OSF
- Election analysis, preprint on SocArXiv
- WikiWorkshop 2022 paper
- Article in Der Kurier
- Learning patterns
- Code snippets regarding learning patterns
- Summary of the presentation at the digital discussion round
- Presentation for DCW Conversation Hour, March 2023.
- All figures uploaded in this final report:
- https://meta.wikimedia.org/wiki/File:Meetingtypes.png
- https://meta.wikimedia.org/wiki/File:Wikipediademeetupsglobe.png
- https://meta.wikimedia.org/wiki/File:Attendeespermeetup.png
- https://meta.wikimedia.org/wiki/File:Meetupsperattendee.png
- https://meta.wikimedia.org/wiki/File:Degreeofusers.png
- https://meta.wikimedia.org/wiki/File:Densitydayssincefirstmeeting.pdf
- https://meta.wikimedia.org/wiki/File:Changeofactivity_barplot.pdf
- https://meta.wikimedia.org/wiki/File:Revertsovertime_barplot2.pdf
- https://meta.wikimedia.org/wiki//File:Reverts_contributor_type.png
- https://meta.wikimedia.org/wiki/File:Thanksovertime_barplot.pdf
- https://meta.wikimedia.org/wiki/File:Electionacrosstime.pdf
- https://meta.wikimedia.org/wiki/File:Boxplotvotesyear.pdf
- https://meta.wikimedia.org/wiki/File:Meetingscandidatepredprob.png
- https://meta.wikimedia.org/wiki/File:Metuserscandidatepredprob.png
- https://meta.wikimedia.org/wiki/File:Successfulcandmeetingspredprob.png
- https://meta.wikimedia.org/wiki/File:Successfulcandvotersmetpredprob.png
- https://meta.wikimedia.org/wiki/File:Votingmetcandidatepredprob.png
- https://meta.wikimedia.org/wiki/File:Votingmeetingspredprob.png
- https://meta.wikimedia.org/wiki/File:Votingvotersmetpredprob.png
- https://meta.wikimedia.org/wiki/File:Votingprometcandidatepredprob.png
- https://meta.wikimedia.org/wiki/File:Votingproprovotersmetpredprob.png
- https://meta.wikimedia.org/wiki/File:Votingproantivotersmetpredprob.png
- All figures uploaded in previous reports and not re-used in the final report:
Learning
editThe best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you took enough risks in your project to have learned something really interesting! Think about what recommendations you have for others who may follow in your footsteps, and use the below sections to describe what worked and what didn’t.
What worked well
editWhat did you try that was successful and you'd recommend others do? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.
As one of my aims was to create guidelines to increase the potential for reproducibility and scalability of this project, I created multiple learning patterns:
- Collecting data on offline meetups
- Collecting data on requests for adminship
- Analysing effects of offline meetups
What didn’t work
editWhat did you try that you learned didn't work? What would you think about doing differently in the future? Please list these as short bullet points.
- There are many things that did not work out at first, but I found my way around to dealing with problems (also highlighted in the learning patterns, e.g. using toy examples when dealing with big data).
- Other issues revolve around the specific nature of my research: It was challenging to set reasonable, well-justifiable and computationally feasible definitions of what "collaboration", "norm enforcement", "norm violation" etc. really mean.
- Generally, open-access is not as straight-forward as I would have hoped it to be with constraints posed by publishers etc.
Other recommendations
editIf you have additional recommendations or reflections that don’t fit into the above sections, please list them here.
- Arguing from the results of my research project, I believe it is important to continue research on offline meetups to particularly better understand the relationships between offline meetups and voting behaviour.
Next steps and opportunities
editAre there opportunities for future growth of this project, or new areas you have uncovered in the course of this grant that could be fruitful for more exploration (either by yourself, or others)? What ideas or suggestions do you have for future projects based on the work you’ve completed? Please list these as short bullet points.
While I completed most of my tasks outline, this project is still ongoing and I am directly continuing with these steps:
- I am currently in the stage of writing up journal articles to publish the results in suitable outlets. I am striving for open-access publications (depending on funding). In any case, I will make pre-print versions available.
- I will make my data freely available as part of the journal publications and try to make designated articles with dataset descriptions. I have the data prepared and ready to be shared, but I want know who is accessing it until I have published my research myself. For now, I am (very!) happy to share it on request.
- I will continue to share the results (for example at the Digitaler Themenstammtisch in January).
More broadly, there are several avenues for future research which I consider fruitful (see also conlcluding sections and concluding chapter in my thesis):
- When analysing productivity of Wikipedians, I focused on the number of edits in certain timeframes. Future research should focus on the size and, if possible, quality of edits. Productivity can further come in different forms which could be analysed separately (contributions towards different namespaces, contributions towards commons, etc.).
- Generally, qualitative research can help to better understand and contextualise the results I have obtained. Motivations of Wikipedians and personal reasonings can better be uncovered through interview material for example. The quantitative findings can form a starting point for in-depth qualitative research.
- In my research, I focused on twenty years of Wikipedia history. Future research could for example also focus on single years and better trace a development over time. Also, one could focus on specific geographic areas/specific meetups to understand their dynamics in detail.
- I have focused on effects of meetings. In a next step, it would be important and interesting to analyse who is going to meetings and how such careers as "Wikipedia meetup-goers" look like. Who are the people who go to only one meeting, who are the people who go to 20, 50, 100 meetings? What is keeping people from attending?
- I focused on the German Wikipedia. It would be interesting to conduct the same analyses in other language versions.
Part 2: The Grant
editFinances
editActual spending
editPlease copy and paste the completed table from your project finances page. Check that you’ve listed the actual expenditures compared with what was originally planned. If there are differences between the planned and actual use of funds, please use the column provided to explain them.
Expense | Approved amount | Actual funds spent | Difference |
Research sponsorship | $26000 | $26000 | $0 |
Total | $26000 | $26000 | $0 |
Remaining funds
editDo you have any unspent funds from the grant?
Please answer yes or no. If yes, list the amount you did not use and explain why.
- No.
If you have unspent funds, they must be returned to WMF. Please see the instructions for returning unspent funds and indicate here if this is still in progress, or if this is already completed:
- No unspent funds.
Documentation
editDid you send documentation of all expenses paid with grant funds to grantsadmin wikimedia.org, according to the guidelines here?
Please answer yes or no. If no, include an explanation.
- Yes.
Confirmation of project status
editDid you comply with the requirements specified by WMF in the grant agreement?
Please answer yes or no.
- Yes.
Is your project completed?
Please answer yes or no.
- Yes and no. I completed most of the tasks as outlined. As a funded grant project, this project is completed apart from final publications. As a research project itself, the project is ongoing and I am still following up on the tasks not yet fulfilled (see section on Next steps and opportunities).
Grantee reflection
editWe’d love to hear any thoughts you have on what this project has meant to you, or how the experience of being a grantee has gone overall. Is there something that surprised you, or that you particularly enjoyed, or that you’ll do differently going forward as a result of the Project Grant experience? Please share it here!
I can reiterate what I have said in my midpoint review: It has been a great experience! I have enjoyed working on this research project and becoming part of the Wikimedia research community. I look forward to continuing my contributions.
Miscellaneous follow-ups
editReferences
edit- ↑ Piskorski, Mikołaj; Gorbatâi, Andreea (2017). "Testing Coleman's Social-Norm Enforcement Mechanism: Evidence from Wikipedia". American Journal of Sociology 122 (5): 1183–1222. doi:10.1086/689816.
- ↑ Coleman, James (1990). Foundations of Social Theory.
- ↑ Piskorski, Mikołaj; Gorbatâi, Andreea (2017). "Testing Coleman's Social-Norm Enforcement Mechanism: Evidence from Wikipedia". American Journal of Sociology 122 (5): 1183–1222. doi:10.1086/689816.
- ↑ Coleman, James (1990). Foundations of Social Theory.
- ↑ Piskorski, Mikołaj Jan; Gorbatâi, Andreea (2017). "Testing Coleman's social-norm enforcement mechanism: Evidence from Wikipedia". American Journal of Sociology 122 (4): 1183–1222.