Grants:IEG/Revision scoring as a service/Renewal

This project is funded by an Individual Engagement Grant

This Individual Engagement Grant is renewed

This project is requesting a 6-month renewal of the grant, to continue work in the areas described below.

Scope

In this 6 month extension, we'll extend our service to Wikimedia communities by extending our system in three ways:

We'll add prediction models in for Wikipedia 1.0 assessment ratings^[1]
We'll add a prediction model for edit type classification ^[2]
Wiki labels UI extensions -- we'll Wikipedians to create and administrate their own campaigns ^[3]

There are also 3rd party developers who are working to extend AI to solve other problems in Wikimedia projects. We'll be spending substantial time and energy supporting their projects. Here are the project we know about thus far:

"Editor expertise profiling" -- (Bob Kraut and Diyi Yang, Carnegie Mellon University)
"Edit quality triage" (Lawrence Daren Welsh, NASA)
"Missing property prediction" (Ladsgroup, Wikidata, WMDE)
"Article importance models" (Harej, w:Wikipedia:Wikiproject X)

Finally, another key outcome of this grant period will be a plan for the long term sustainability of this project. IEG funding has gotten this project off the ground, but it's time to level up and find a home. We'll be exploring many directions including Wikimedia Foundation engineering (e.g. the mw:Services), mw:Research and Data and product teams like mw:Collaboration. We've also managed to fund one of our major contributors through en:Wikimedia Deutschland, so we'll be talking to them about extended support if our work on Wikidata is successful. By the end of this 6 month period, we plan to have a plan! In the meantime, we'll be making the value and needs of our project clear to these parties.

Budget

For hardware cost calculations, digital ocean prices were used as basis. We use Wikimedia Labs instead to circumvent these costs. Also note that we intend to increase our hardware usage based on user demand using labs.

Instance Name	Launch Time	# of CPUs	RAM Size	Amount of Storage	Price
ores-compute	26 May 2015 06:27:49	8	16 GB	160 GB	$160/mt x 6 = $960
ores-web-01	10 June 2015 13:08:31	4	8 GB	80 GB	$80/mt x 6 = $480
ores-web-02	10 June 2015 13:08:38	4	8 GB	80 GB	$80/mt x 6 = $480
labels	4 May 2015 19:51:18	2	4 GB	40 GB	$20/mt x 6 = $120
ores-lb-01	10 June 2015 13:07:40	2	4 GB	40 GB	$20/mt x 6 = $120
ores-redis-01	10 June 2015 14:31:35	2	4 GB	40 GB	$20/mt x 6 = $120
ores-staging	10 June 2015 10:34:56	2	4 GB	40 GB	$20/mt x 6 = $120
Total	-	24	48 Gb	480 GB	$400/mt x 6 = $2400

Note that

EpochFail/Halfak (WMF) will continue on as a project manager and technical lead with a mixture of volunteer and his WMF staff time
Yuvipanda/YuviPanda (WMF) will continue to assist the project with maintenance & performance improvement as a part of volunteer and his WMF staff time

Contactors for development & community organizing stipends - 40 hours per week for 25 weeks @ $30/hr = $30,000

We intend to hire two or more contractors/grantees and divide this between them as their schedules permit.
- 20 hours per week for 25 weeks @ $30/hr = $15,000 - とある白い猫
- 20 hours per week for 25 weeks @ $30/hr = $15,000 - (other)

Rationale

We have came a long way since the start of our project. We feel we have a significant room of improvement even beyond our goals of the past six months. Everything we are hearing suggests that there is demand for our work. We don't want to lose momentum and we want to implement focus on implementing the features that seem to be in highest demand (which we have listed above).

Measures of success

Adoption rate: We will instrument logging in the API service in order to track how many scores are requested. Success means gathering wide adoption. It could be that a single tool uses our scores heavily or that many tools use the service lightly. The most important measure of our impact is how often our scores are *used* externally. We will instrument the funnel of score generation to know how many scores are requested/served and how fast our response is using statsd and graphite.

Supplementally, we'll also track the set of tools that use revision scoring as best we can using word of mouth and UserAgent processing. See our manually curated list. We'll be successful if we can either sign on a major tool (e.g. en:WP:Huggle) or double the number of lessor tools and analyses on the list by the end of 6 months.

New models: We will use standard model training and testing strategies to ensure the accuracy of new models for predicting article quality and edit types. New models will first be deployed for enwiki, but will later be deployed for other wikis as needed and where we can obtain liaison support.

System performance: We will perform analyses and improve system performance of the ORES service so that most requests are served from the cache (1000-10,000 times faster) and requests of 50 'reverted' scores will have a median response time of less than 6 seconds.

Notes

↑ e.g. we've already used this model to support WikiProject Medicine re-assessment triaging -- we'd like to host the model on ORES and build a simple self-serve UI.
↑ e.g. questions about edit types raised in the Visual Editor experiment and future interventions could be more easily answered. We're also working with researchers who would like to build editor expertise profiles who would like to use this service.
↑ Right now, starting new campaigns and doing administrative work requires access to the database. We'd like to extend the system to be used by any Wikipedia editor.

Community discussion

Notification

Endorsements:

Do you think this project should be continued for another 6 months? Please add your name and comments here. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.

Support While many probably find it difficult to evaluate this project, since not enough time has passed for many end-user tools to adopt it, the underlying idea clearly holds promise and I think it should be granted the extension to allow it to grow to its potential.--Anders Feder (talk) 07:08, 5 July 2015 (UTC)
Support I totally support this project and the underlying goals behind it. It may seem like a lot of money, but I think it's a bargain at twice the price if the end product is something that is used by community members to gain better grip on editing behaviors while estimating quality at the same time. Jane023 (talk) 09:50, 5 July 2015 (UTC)

Support Although accompany this project since a short time, already I noticed its fruits in my home wiki.I believe that its renewal is not only necessary, but also important for system expansion, thus offering more support to the wikis. --Leon saudanha (talk) 23:53, 5 July 2015 (UTC)
Support I am using this tool on pt-wiki. It is very usefull because vandalisms are labeled on my watch list. The renewal will support the tool improvement. I think that it is as great tool to fight against vandalisms on all wikis.Ixocactus (talk) 20:56, 6 July 2015 (UTC)

[1] .g. we've already used this model to support WikiProject Medicine re-assessment triaging -- we'd like to host the model on ORES and build a simple self-serve UI.

[2] .g. questions about edit types raised in the Visual Editor experiment and future interventions could be more easily answered. We're also working with researchers who would like to build editor expertise profiles who would like to use this service.

[3] Right now, starting new campaigns and doing administrative work requires access to the database. We'd like to extend the system to be used by any Wikipedia editor.

[1]

[2]

[3]