User:MCruz (WMF)/Sandbox/Program Reports/Overview

Wikimedia Programs Evaluation 2015

Overview of Program Reports.
Find out more about the purpose and goals of Wikimedia Programs Report, and overall response rate and data limitations.

Purpose and goals

This initial version of the Evaluation Report aims to provide the Wikimedia community with a first look at data collected from community-run programs around the world and to identify opportunities to further support the community with program evaluation and design. The report includes data from the first Data Collection Survey in addition to data pulled from online tools that are also available to the community.

The goals of this initial report are:

To use this as a baseline or starting point about metrics and data reporting, with the hope that program leaders in the Wikimedia community will be inspired to collect and report data that can assist programs in reaching their identified goals.

So that the Program Evaluation and Design team and the Wikimedia Community can use this pilot report to explore methods for improving the collection and reporting of data—learnings that can be applied to the next data collection survey and report! We want to support program leaders to make evaluation and learning easy and fun.

Overall response rates and limitations

Response rate

23 program leaders voluntarily reported on 64 programs they produced. Our team removed six reported programs due to the inability to disaggregate and confirm the numbers that were shared, bringing the usable total to 58. In addition, one program leader sent the Program Evaluation & Design team a list of cohorts from six workshops they produced, for which we pulled the data ourselves. That data is included in this report. To expand collection, we mined data for 61 additional program implementations from data that was publicly available on wiki (i.e. reports, event pages). This collected data comprised 51% of data used in this report. This increased our collection of output and outcome data, helping us to fill in some gaps that were not covered from the surveys or responses. In total, 119 program implementations have been included in this report.

Data issues and limitations

The survey had a low response rate, thus a low number of reported data and a high variability in the data that was reported.

Because of this, this report includes means, response range, standard deviations, and medians. Because of the wide range of numeric responses, and thus low number of modes, modes are being reported selectively, and not in all the data reported in this report.

Program leaders aren't consistently reporting program budgets and staff/volunteer program implementation hours.

Even those who have been tracking their inputs, outputs, and/or outcomes have done so with varying consistency and levels of analysis. For example, while many program leaders track their budgets, they often don't track the budget down to the details—they only track the overall budget. Thus details about how much certain parts of a program cost, and other specifics, are lacking. Out of the 59 programs reported on by program leaders, 64% included a budget report; however, 22% (12 out of 13 reports provided directly) reported no budget but did report hours invested.

Most program leaders who responded to the survey were able to estimate how much staff (51% of data reported) and volunteer (81% data reported) hours went into implementing a program, but, very few were able to report exact hours (7% for staff hours, and 5% for volunteer hours). In total, 89% of program leaders reported some type of data about hours, with volunteer hours being reported most often at 86% of the implementations reported directly.

Out of the mined data that we collected based on public records, the only programs that had available budget information were the 24 Wiki Loves Monuments events (44% of mined data). However, the Wiki Loves Monuments data we mined did not provide any staff or volunteer hours. The other programs that we mined, which totaled an additional 30 programs, had no budget or hours that were publicly reported on. In conclusion, report data is lacking in each of the six programs at this time. This also means that we were unable to complete any meaningful cost-benefit analysis at this time.

For content production metrics, only a minority of program leaders were able to report on most measures for their program events (i.e. edit counts, characters added, media uploaded, pages created).

The following percentages are how many program leaders reported the following data:

63% – photos/media uploaded
39% – edit counts
27% – amount of text added to Wikipedia's article namespace (for most European languages 1 byte = 1 character)

Finally

Under half of the respondents were able to share data about the retention of new or existing editors (45%). 63% of reports contained partial or complete data for budget, hours, and content production.

The team also acknowledges the timing of their reporting requests, due to many program leaders being involved in the wrapping up and reporting for Wiki Loves Monuments 2013.

Supplemental data mining

We had to collect extra data, to fill in some gaps due to the low response rate.

In addition to collecting self-reported program data from program leaders, we worked hard to identify and locate potential sources for program data. Some program leaders provided us cohort usernames, event dates, and times, which allowed us the opportunity to inquire into their events in order to fill-in certain data gaps. We collected additional data on the following programs:

Edit-a-thons – Edit-a-thons were the most frequently self-reported program type. However, many program leaders did not track usernames of participants in order to track their contributions made before, during, and after the event. We pulled additional data on 20 English Wikipedia edit-a-thons, for which public records of participants were available on wiki. These names were used as cohorts to track user activity rates 30 days prior to the event, during the event, and 30 days after the event. This allowed us to examine content production and user retention related to edit-a-thons. We also pulled 30 day prior and after data for two edit-a-thons submitted by program leaders through the survey.

Editing workshops – One program leader submitted usernames, event dates, and program details for six workshops. This allowed us to create cohorts for those six workshops and pull data via Wikimetrics. We pulled data on the cohorts to examine the three and six month retention of new users in the cohort list. Some usernames were unable to be confirmed via Wikimetrics and additional research, but the majority were able to report usable data.

On-wiki writing contests – Additional data for on-wiki writing contests were pulled for six contests in three different language Wikipedias. This data, which was publicly available on wiki, included data gathered regarding program dates, budget, number of participants, the content was that was created/improved, and the quality of the content upon the end of the contests. We worked with program leaders, when possible, to confirm volunteer hours and budget. We were unable to judge retention and characters added due to limitations in being able to pull only contest specific data.

Wiki Loves Monuments – We used data from three directly reported Wiki Loves Monuments as well as data from 24 Wiki Loves Monuments implementations from 2012 and 2013 that had received Wikimedia Foundation grants (including those from the FDC) and had reported a specific budget for the program. This totaled 27 program implementations of Wiki Loves Monuments for two years. We used publicly collectable data regarding those 24 Wiki Loves Monuments to gather information about the number of: participants, photos added, photos used, and photos named as Featured, Quality, or Valued images. This data was pulled using three community built tools: Wiki Loves Monuments tool by emijrp, GLAMorous, and CatScan 2, the latter two created by Magnus Manske. We also contacted program leaders from the 24 Wiki Loves Monuments to review and confirm numbers gathered, and contribute additional data regarding budget and donated resources.

Other photo upload initiatives – An additional five upload events were tracked down for reporting through various ways. Additional data was gathered by us for an additional five program implementations, these programs included three other Wiki Loves events, a Wiki Takes event and the pilot project Festivalsommer 2013. These programs were selected in order to expand on the amount of data regarding other photo upload events. Data pulled was based on publicly available information and on direct reports from program leaders. Data collected included the number of participants, photos uploaded, photos used, and photos named as Featured, Quality or Valued Images. The team used the Wiki Loves Public Art tool created by Wikimedia Österreich to pull selected data. For the Festivalsommer project, additional data was acquired through direct interaction with the program organizer.