Talk:Wikimedia Foundation Scoring Platform team

Story: RecentChanges Patrolling

Latest comment: 7 years ago1 comment1 person in discussion

Every day, English Wikipedia gets about 160k edits that need to be reviewed for vandalism and other types of damage. Based on how long it takes someone to review the edits used to train ORES, I estimate that an editor can review 50 edits in about 5 minute. With that in mind, it should take about 267 hours of work to review edits to English Wikipedia every day. That's the equivalent of 33 people working 8 hours every day.

With the edit quality models deployed in ORES, we can filter out ~90% of all edits because we're highly confident that they're OK and don't need review. This reduces the total number of hours that patrollers need to review edits to 26.7 hours. Which means that, when ORES is in use, we only need 4 people to work 8 hours per day -- or 33 people working for less than an hour. --Halfak (WMF) (talk) 19:41, 4 February 2017 (UTC)Reply

Story: WikiProject Medicine

Latest comment: 7 years ago1 comment1 person in discussion

Based on a request from members of WikiProject Medicine, User:Nettrom and I used the article quality model that we'd engineered for ORES to help Wikipedians find articles that had been classified as Stubs that were no longer stubs. Using the classifier, we were able to reduce the number of pages that needed to be viewed for re-assessment from 9031 articles to 747 -- a decrease in the overall workload by 91.8%!

See the writeup. m:Research:Screening WikiProject Medicine articles for quality --Halfak (WMF) (talk) 19:42, 4 February 2017 (UTC)Reply

Shouldn't we be putting someone else in charge of ORES so that Aaron can focus on research?

Latest comment: 7 years ago1 comment1 person in discussion

I'm a somewhat unusual type of Research Scientist. I'm a system builder and ORES is an effective research platform for exploring the integration of social production community practices and advanced technologies. My leadership regarding the direction of ORES as core to my research agenda.

I've already already been acting in a product management capacity for the ORES project and I've been managing small grants (proposals and reports) to keep a small group of people partially funded for short periods of time to work on the project. Any long-term resource support will extend the time and energy I can use to focus more broadly on the project's direction and the high level research topics.

The research needs of ORES are already relatively distributed. In order to operate more effectively as a principal researcher, I advise and consult with many external researchers. If I get the support I'm asking for, I'll have more time for leading and advising research of ORES and its context. These research projects will form the foundation of the documentation necessary for accountability and the vision necessary for direction-setting. --Halfak (WMF) (talk) 19:42, 4 February 2017 (UTC)Reply

Does ORES need to be "productionized"?

Latest comment: 7 years ago1 comment1 person in discussion

ORES is already running in production at a high capacity and with a high level of up-time. The code has passed security review and has had a more casual review from senior engineers at the Wikimedia Foundation. It doesn't need to be "productionized", but rather, the system needs to grow and be extended to accommodate the needs that have and will emerge. Bringing new models to production, improving performance, and extending accountability are among our goals.

To be fair, Wiki labels does need to be productionized and there's currently a collaboration with the Collaboration Team to start that process.

"Meta-ORES", a robust false-positive and feedback gathering system is only a proposal at this point, but clear user needs have made it's necessity evident. Bringing this accountability mechanism to production will require a lot of work. --Halfak (WMF) (talk) 19:43, 4 February 2017 (UTC)Reply

What kind of expertise will the engineers assigned to this team need to have?

Latest comment: 7 years ago1 comment1 person in discussion

Modeling and analysis skills can come from me, the Research Team and our external collaborators. Most of the engineering that needs support is basic web development and some distributed processing systems work. So, anyone with a solid background in software engineering around web technologies and a tolerance of the Python ecosystem should be able to gain the type of competencies that we need quickly.

At least one engineer will need to be at the "senior" level so that they can draw from experience to help architect the system and make decisions about which technologies to adopt. --Halfak (WMF) (talk) 19:43, 4 February 2017 (UTC)Reply

What would happen in the next 3 years if we manage to get support and have Aaron lead?

Latest comment: 7 years ago1 comment1 person in discussion

There will be three focuses of effort over the next three years.

Accountability mechanisms: In our past work, we've seen the need for accountability mechanisms emerge. This will both empower our users to refute ORES' predictions and will give us a better opportunity to discover problems with prediction models. Every problem is an opportunity to improve fitness. By implementing open accountability mechanisms, we'll provide a legitimate, alternative view of how algorithms ought to operate in online spaces (as opposed to google/facebook/twitter/other-big-tech-company).
New prediction models: Expand the types of predictions that ORES makes. Currently, ORES makes predictions about edit quality and article quality. We're working on modeling the aggressiveness of conversations, the types of changes made in edits, the quality of new page creations, the quality of Wikidata items, and the importance of Wikipedia articles. With more resources, we'll finish up those projects and get models deployed so that developers can start experimenting with them. With each new model, we open the doors to new types of products and technologies for making Wikipedians' work easier.
Support more wikis: Currently, we have basic support for 23 wikis and advanced support for 8. In order to speed up the rate at which we extend support to new wikis, we need to improve the tools that Wikipedians use to help record their judgements for training the models. We'll also need to dedicate more time to liaising with communities and recruiting confederates to help adoption & false-positive reporting.

--Halfak (WMF) (talk) 19:44, 4 February 2017 (UTC)Reply

How will other teams be affected by this proposal?

Latest comment: 7 years ago1 comment1 person in discussion

Legal: We publish datasets that often have privacy and other sensitive information concerns. In the past we have worked with Legal to make sure that these publications have been reviewed. If we increase our capacity as planned, we'll likely double the rate at which we publish these kind of datasets.
Technical writing: Documentation of the systems we develop will grow along with the systems. We'll need part of a technical writer's time to help keep our documentation high quality.
Community engagement: A large part of our work involves engaging with various Wikimedia communities so that they can help us gather labeled data to train new models and so that they can tell us about problems/opportunities that they see. We'll need liaisons to help with recruiting local confederates in different wiki communities to help translate and advocate.
Research: In order to maintain a steady stream of new development and to take advantage of the research platform, our team will regularly need to work with external researchers and to recruit highly skilled interns. We'll need to work with the Wikimedia Research team to recruit these collaborators and to interview/vet interns. There will also likely be direct collaborations with Wikimedia Research on the development and deployment of new models (Research-and-Data) as well as evaluations of the models' utility to users (Design-Research).
Security: Currently the ORES infrastructure has been reviewed by the security team and does not pose a substantial security risk to our private data or our users. However new developments like the proposed Meta-ORES accountability system will enable direct contributions from users that will include the use of free-form text. Reviewing the security and privacy of these mechanisms will require substantial efforts on the part of the security team initially and then follow up reviews for any large changes to the contribution, review, and suppression mechanisms. (Note that a preliminary document describing these specific concerns has already been filed with Privacy and Security.)

--Halfak (WMF) (talk) 19:44, 4 February 2017 (UTC)Reply

Why isn't Product support enough support?

Latest comment: 7 years ago1 comment1 person in discussion

Product teams who are using ORES need to use it for a specific, user-value focused purpose. We've had success in the past with trading resources (Aaron's consulting for Engineering support for ORES & related projects), but that support is generally specific to a particular component of the project and it doesn't involve any sort of long-term commitment.

I need engineers working on ORES to be able to focus on ORES so that they can think slow and subconsciously about the direct of the project and therefor make long term proposals/contributions. I think this is essential for the projects success. Product is not in a position to assign engineering resources to the project on a multi-year timescale. --Halfak (WMF) (talk) 19:45, 4 February 2017 (UTC)Reply

How would this be sustainable at such a small scale?

Latest comment: 7 years ago2 comments1 person in discussion

Two reasons: external collaborations and contracts/internships.

A lot of the work for ORES involves external collaborations. We receive substantial, though sporadic, contributions from a large set of volunteers who find ORES to be useful. Further, most of our new model development is done with external researchers. For example, the modeling initiatives that we have going right now are:

edit types (Kraut & Yang @ CMU) -- predicts the type of change made in an edit (e.g. copy-edit, simplification, process, wikification, etc.)
draft quality (N. Jullien @ Telecom-Bretagn) -- flags new page creations for spam, vandalism, and personal attacks and allows slower review of other, less problematic, article drafts.
detox (Ellery et al.) -- flags aggressive talk page comments
article importance (Warncke-Wang @ UMN) -- predicts the importance of an article to the whole project and within a specific subject-space.
academic/pop-culture/other (Terveen & Shiroo @ UMN) -- predicts whether and article is generally about an academic subject, a pop culture subject, or other.

We'll use the contracting budget to invite some of our collaborators (volunteer and external researchers) to work with us as contractors & interns. This will allow us to give young subject-matter experts opportunities to address core problems to ORES and opportunities to potentially pursue a path towards a more substantial engagement with the team. I've had a lot of success in the past working effectively with researchers and volunteers on short-term contracts through IEG. --Halfak (WMF) (talk) 19:45, 4 February 2017 (UTC)Reply

I should note that we'll also be able to operate at a small scale because we won't focus directly on end-user-facing problems that Product teams are better equipped to deal with. Engineering for users working on-wiki requires a lot more resources (design, engineering, technical writing, etc.) We'll be focusing on engineering for developers and that's why the proposal includes tech writer resources. --Halfak (WMF) (talk) 19:46, 4 February 2017 (UTC)Reply

Where should this team live and why?

Latest comment: 7 years ago2 comments1 person in discussion

Technology is responsible for platforms. Platforms serve many audiences and many different use-cases. Technology prioritizes long term sustainable infrastructure and serves a developer audience.

Product develops features that serve a specific end-user-value. Product prioritizes delivering on the highest impact use-cases.

The ORES service is a platform -- an infrastructure for other work. The technologies around ORES (revscoring & Wiki labels) provide an ecosystem for the platform. ORES' audiences include researchers, developers, tool-builders, and product teams. Each model that ORES uses to produce a new score is a platform with many audiences itself. For example, the "edit quality" models are used in several different wiki tools and for many different research projects. The vision of ORES includes discovering effective means for developing long term infrastructure for a large, general class of valuable tools and analysis.

For these reasons, I think that it makes sense to have the ORES Team live within Technology. --Halfak (WMF) (talk) 19:48, 4 February 2017 (UTC)Reply

I should mention that I also see a clear boundary for collaborating around ORES with the Product teams and the Tech ORES Team. The ORES Team will prioritize the development of infrastructure and scoring models that enable a general class of AIs based on research insights (e.g. see Wikipedia's quality control & newcomer socialization tradeoffs).

Product teams, tool developers, and researchers will able to use ORES to serve different, more direct, applications. For example, the Collaboration Team is currently using ORES' edit quality models to improve quality control in Wikipedia and the Search team is experimenting with using ORES' article quality models in search ranking.

This is a similar divide that manifests around our relationship with volunteer tool developers and it is one of the key insights of my research into the engineering of technologies for supporting Wikipedians' work: it's most effective to enable others to innovate around their own points of view. --Halfak (WMF) (talk) 19:49, 4 February 2017 (UTC)Reply

Duplicate info?

Latest comment: 7 years ago2 comments2 people in discussion

Most of this, except for the stories, is on the main page, it is confusing to find it here as well. Elitre (WMF) (talk) 19:47, 1 March 2017 (UTC)Reply

Indeed. So my original plan was to update the FAQ as conversations about each point continued. However, it seems that people have not been interested discussing the FAQ beyond my initial responses. I agree that this is confusing as it stands. Currently, there is slightly more discussion on the talk page than in the proposal document, so I'd like to not just archive all of that. Maybe it would make more sense to do an integration and archiving of the discussion points on the talk page afterwards. --Halfak (WMF) (talk) 23:23, 1 March 2017 (UTC)Reply

Meta vs. Mediawiki.org

Latest comment: 7 years ago2 comments2 people in discussion

While we are talking about confusion, all the other WMF Engineering team pages are on mediawiki.org. I can see how it makes sense to describe this team on the same wiki where the research initatives and ORES documentation are, but it still breaks expectations (and possibly tooling for reports, although no one seems to bother to do that on-wiki these days anyway). --Tgr (WMF) (talk) 20:05, 1 March 2017 (UTC)Reply

Hey! Tgr, there are other teams described on meta (e.g. m:Growth) but regardless, this is a proposal, not a team's documentation page. --Halfak (WMF) (talk) 23:17, 1 March 2017 (UTC)Reply

Add topic