Learning and Evaluation/Archive/Connect main page/Metrics Brainstorm

Learning and Evaluation

Discuss: Metrics Brainstorm

For Appendix 1 (Text metrics votes) only:

Metrics Brainstorm Session, WMCON 2014: Summary and Discussion


Page Summary

This report is an overview of a discussion started on Measuring Program Impact on Projects, at a brainstorm session during the Wikimedia Conference 2014 in Berlin. The goal of the session was to start a community discussion around the various metrics used in the Evaluation Reports (beta), get the community's feedback around the metrics presented in the report series specifically in terms of what the metrics tell us, what they do not tell us, and how we else we might capture other important measures of program impact. After a brief introduction workshop participants broke up into two groups; the first group brainstormed on programs that focused on producing text while a second group on programs that focus on images. After brainstorming other important impact targets for measurement, and possible ways of measuring those targets, each participant was allowed to vote on their highest priorities.

Session Results

The top four priorities for text programs
  • Measuring Quality (81 points across all quality measures)
  • Community Impact (35 points)
  • Needs Assessment (27 point)
  • Program Knowledge (6 points)
The top four priorities for media/image programs
  • Quality (It is not specific enough) (9 points)
  • Use of Wikimedia images outside of projects (9 points)
  • Proportions of images used (5 points)
  • Geographic (and other) diversity, spread (4 points)

Please visit the discussion page to share your input!

Background

edit

During Open Thursday on April 10 at the Wikimedia Conference 2014, The Wikimedia Foundation's Program Evaluation team, part of Grantmaking's Learning and Evaluation team, hosted a brainstorm session called Measuring Program Impact on Projects to get a sense of what the reports tell us about the Wikimedia programs, what do the reports not tell us about the programs, and what else could we measure. Below is a summary of the brainstorm session and the appendix provides a complete list of all the topics covered.

Evaluation Report Metrics

edit

The Evaluation (Beta) reports were completed in early April of 2014 and are an initial attempt to systematically assess the various program that Wikimedia organizations and individual volunteers are doing around the world. These reports use a set of metrics to get a sense of what the goals are of the programs as well as the inputs, outputs and outcomes of the various programs. At the beginning of "Measuring Program Impact on Projects," we reviewed the metrics used in the reports. Here, we present a comprehensive summary of the metrics:

Program Goals
Program Inputs

  • Budget
  • Staff and Volunteer Hours *adding number of staff and volunteers and valuation of hours
  • Donated Resources (Meeting Space, Materials/Equipment, Food, Prizes/Giveaways) *adding valuation of donated resources

Program Outputs

  • Event length (Start to End date/time)
  • Count of participants and/or partners (i.e., instructors, institutions, GLAM partners)
  • Count of new user accounts created
  • Count of bytes added and/or file pages created during the event

Program Outcomes

  • Count of articles created or improved (note: this is more an output, but is seen as an important end goal for many program leaders)
  • Count of photos and/or pages of text added
  • Usage and quality ratings of content added
  • For images, counts of: Unique images used, Quality Images, Valued Images, and Featured Pictures
  • For text, counts of: Good Articles and Featured Articles
  • Recruitment and retention of “active” new users
  • Count of surviving editors
  • Count of “active” editors
  • Split for new vs. existing users
  • Implementation Sharing/Program Replication:
  • Program run by experienced program leader
  • Program blogs or informative online posts
  • Program brochures and printed materials
  • Program has guide or instructions

Text and Media Categories

edit

For this brainstorm, we separated two different categories of programs; those that focus on "text" and those that focus on "media", which surfaced as the two targets for content among the programs programs analyzed in the Evaluation Reports (beta).

Text programs
Edit-a-thons
Editing workshops
On-wiki writing contests
Wikipedia Education Program
Media programs
Wiki Loves Monuments
Other photo initiatives
GLAM content partnerships.



Brainstorm Structure

edit
15 minutes A review of the metrics listed above
45 minutes The entire group broke out into two separate rooms. In one room, programs that focused on content (i.e. text) were discussed while in the other media/image programs were discussed. Using a sticky wall, participants discussed and offered suggestions in each room. The three main categories for discussion were "What do they tell us?", "What do they not tell us?", and "How else might we measure?", where "they" refers to the metrics used in the reports. After topics and ideas were posted, participants had the opportunity to vote on the various ideas presented.
30 minutes Media group joined the Text group in their room, where the Text group gave an overview of the topics covered. Media group got a chance to vote on the Text group topics. This process was repeated when both groups went to the Media group's room.

Voting Rubric

edit

The voting process for each of the two groups was done differently:

Text programs

Each participant was allowed one of each of the following sticky dots:

Red = 3 points, high priority
Yellow = 2 points, medium priority
Green = 1 point, lower priority

Media programs

Each participant was given two sticky dots to place on the two highest priority items, without differentiating among the colors.

Results

edit

Participation

edit

While a head count was not explicitly done during the event, about 27 individuals; 13 to 15 attended the Text group while 12 or so attended the media group, not including program facilitators.

Text program priorities

edit

The text programs "points of interest" were grouped into various themes, called "groups" in the appendices. This was done because several of the points of interest shared an overall theme. The two highest themes were Quality and "Social/Community impact." For the full details of the results, be sure to visit the appendices at the bottom of this page.

The specific points of interest that were rated the highest were:

Points of interest

Measuring Quality

Community Impact

Needs Assessment

Program Knowledge

Importance of Impact

Total points

81

35

27

6

5

Image program priorities

edit

The rating system for the second group differed, however, the main priorities for this group still surfaced. They included:

Points of interest

Quality (not specific enough)

Use of Wikimedia images outside of projects

Proportions of images used

Geographic (and other) diversity, spread

Number of image downloads

Total points

9

9

5

4

3

Appendices

edit

For your reference, we include below all the data and photographs we took at the event to help you understand the data collected and the results above.

Appendix Key

edit
Category Number - - - - - Description - - - - -
1 What do they tell us?
2 What don't they tell us?
3 How else might we measure?
Color Code - - - Point Value - - -
Red 3 points
Yellow 2 points
Green 1 point

Appendix 1: Text Programs

edit
Number Type Category Points of Interest Group Group (desc) Red Yellow Green Total points
13 Text 2 Quality (and Return on Investment (RoI) in terms of quality) 2 Quality (Overall) 9 3 2 35
18 Text 2 Social/community impact 3 Community 5 0 5 20
34 Text 3 Needs assessment 7 Needs Assessment 0 8 3 19
40 Text 3 Pairing qualitative and quantitative information 0 Quality (Data) 3 2 2 15
24 Text 2 No offline volunteers included 0 Community 0 2 4 8
25 Text 2 Gender & diversity of participants 0 Quality (Diversity) 1 1 3 8
33 Text 3 Identify gaps (e.g., missing articles by unsuccessful search) 7 Needs Assessment 1 1 3 8
35 Text 3 Software that measures the complexity of language, readability 0 Quality (Readability) 1 2 1 8
39 Text 3 Talk page analysis 0 Quality (Engagement) 0 3 1 7
26 Text 2 How to improve processes 4 Process 2 0 0 6
37 Text 3 Number of references in certain articles 6 Quality (Sources) 1 1 0 5
20 Text 2 Community building 3 Community 1 0 1 4
14 Text 2 Quality assessment is limited (What does # of bytes tell us?) 2 Quality (Content) 1 0 0 3
19 Text 2 Fun or "engaged" 3 Community 1 0 0 3
29 Text 3 Pre/post survey on projects 0 Documenting 0 1 1 3
8 Text 2 Importance of articles (views) 1 Importance 0 1 0 2
38 Text 3 Encourage event documentation (record video like Wikipedia Zero) 0 Documenting 0 1 0 2
1 Text 1 Raw number increases 0 Documenting 0 0 0 0
2 Text 1 Sets a comparison point (baseline) 0 Documenting 0 0 0 0
3 Text 1 Program resource inputs lead to edits 0 Program Knowledge 0 0 0 0
4 Text 1 Easier to collect data in a broader sense 0 Program Knowledge 0 0 0 0
5 Text 1 Gives us relative proportion of people that make contribution 0 Program Knowledge 0 0 0 0
6 Text 2 Importance of changes made (grammar vs. important content) 1 Importance 0 0 0 0
7 Text 2 Hard to see the categories of the articles (do they fill a gap?) 1 Importance 0 0 0 0
9 Text 2 Topics covered/diversity 1 Importance 0 0 0 0
10 Text 2 How a new participant learned from articles created or improved 1 Importance 0 0 0 0
11 Text 2 Level of article reorganization 1 Importance 0 0 0 0
12 Text 2 Quality of articles (most won't be good/featured) 2 Quality (Content) 0 0 0 0
15 Text 2 Reverts and deletions (after a workshop) 2 Quality (Content) 0 0 0 0
16 Text 2 Number decreases can be a good thing 2 Quality (Content) 0 0 0 0
17 Text 2 Quality of edits made (Are they adding goods tuff or just junk) 2 Quality (Content) 0 0 0 0
21 Text 2 Do they feel welcome 3 Community 0 0 0 0
22 Text 2 Do people stay/increase their activity? 3 Community 0 0 0 0
23 Text 2 Relative to %-i.e. 5 new active editors but workshop had 50,000 attendees 3 Community 0 0 0 0
27 Text 2 How to tweak our programs to get more active edits/editors 4 Program Knowledge 0 0 0 0
28 Text 2 Scaleability 0 Program Knowledge 0 0 0 0
30 Text 3 Measure quality by traffic to articles? 5 Quality (Reach) 0 0 0 0
31 Text 3 Changes in traffic 5 Quality (Reach) 0 0 0 0
32 Text 3 Cross-reference information - intrawiki pagerank project (wikirank) 6 Quality (Reach) 0 0 0 0
36 Text 3 Gather numbers to be retweeted/facebook posted, google plus of certain articles 0 Needs Assessment 0 0 0 0
41 Text 3 "Re-use of incoming links (like page rank) and outgoing links 0 Needs Assessment 0 0 0 0

Appendix 2: Media Programs

edit
Number Type Category Points of interest Total points
42 Media 1 Categorizing makes it easy to count (and show) 1
43 Media 1 Proportions of images used 5
44 Media 1 Use of images on projects 0
45 Media 1 Number contributors (new and active) 0
46 Media 1 Additional materials for project 0
47 Media 2 Number new instances of an image (e.g. new monuments with images after WLM) 0
48 Media 2 Number views 2
49 Media 2 Average number images per contributor 0
50 Media 2 Usage of images outside of Wikimedia projects 9
51 Media 2 Geographic (and other) diversity, spread 4
52 Media 2 Number downloads 3
53 Media 2 Quality (not specific enough) 9
54 Media 2 Negative impact (i.e. images that take up volunteer time to delete) 3
55 Media 2 Contributions from museums 1
56 Media 2 amount of coverage in national news 0
57 Media 3 Quality - use jury ratings 0
58 Media 3 Use of knowledge materials 2
59 Media 3 GLAMorous tool to include ability to search by timeframe 3
60 Media 3 Split up "Quality" 2
edit
edit