Wikimedia monthly activities meetings/Quarterly reviews/Discovery, October 2015

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Discovery Team, October 5, 2015, 10:30am - 11:30am PDT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present (in the office): Lila Tretikov, Tomasz Finc, Greg Grossmeier, Terry Gilbey, Dan Garry, Max Semenik, Tilman Bayer (taking minutes), Stephen LaPorte, Kevin Leduc, Rachel diCerbo, Moiz Syed; participating remotely: Luis Villa, Katherine Maher, Wes Moran, Trevor Parscal, Arthur Richards

Wes: welcome

[slide 1]

Dan:
Discovery team is interested in how people discover things on Wikimedia sites
e.g. via maps too [not just search]
only started measuring these KPIs, so no YOY
Lila: can we start monitoring referrer traffic? yes, goal for this q
Lila: no budge on zero results rate? we'll get to that

[slide 2]

(Dan:)
chose to focus on zero resuls rate as proxy for user satisfaction (assuming people are not happy with 0)
found that bots account for a lot of the search traffic
objective was correct, but not the right measure
Lila: long story short - 30% does include bots? yes
what's the number without bots?
Dan: don't know exact number
Lila: expect it to be higher or lower with bots filtered out?
Dan: lower, e.g. on Dutch Wiktionary 99% of traffic because of one bot
Wes: filter bot traffic on everything; this was part of the learning
Lila: where do we still have prefix search?
Dan: for search box on top of page, also on apps
because it's fast
Lila: don't necessarily need A/B
can switch completely for like 2h, observe effect
Dan: probably not ready for putting it to all users yet
Lila: goal for this q on this?
Dan: won't directly try to impact it
focus mainly user satisfaction
if 0 results rate is still 33% for humans, it absolutely needs to be a goal
Wes: decided to take other goals like lang support as primary goals

[slide 3]

(Dan:) WDQS: e.g. list of US presidents born before X
Terry: so launched a new service, still exploring usage
is there a go/no go criterion?
Dan: no, just going to maintain it this q
then look at usage (qualitative / quantitative)
and feature requests (this is still a stripped-down version)
Terry: want to avoid feature creep
Lila: who is the customer?
Dan: not sure
initially motivated by Wikigrok (which was shelved)
were 95% there anyway, decided to roll out and see
Wes: we also had Wikidata team as stakeholder
--> Lila: work with Toby's team to see if we can service some of this [WDQS] data
internal need for this
Wes: had meetings with them
Lila: figure out Venn diagram between Wikidata and infoboxes on WP
Wes: also, natural language queries
found in our surveys etc. that people prefer natural language queries
Lila: yes, Google has trained people to do that
Dan: a prototype demoed at hackathon takes nat lang queries and translates them into WDQS
Greg: do you have a policy on operational support for your project from your team members? (man hours, etc)
Tomasz: should not take website down
ours are tier 3 services
---> Lila: might be a good thing to ask Mark to produce document on Tier1 vs Tier2 vs 3 services (support criteria, ...)
Wes: yes, need to specify that

[slide 4]

Dan: maps
Wikivoyage maps on Labs
Lila: how hard is it to integrate a map into article?
Tomasz: we have a working prototype for VE
Lila: before that, cluster needs to be ready for usage in production, does Mark know?
Wes, Tomasz: yes
Lila: time frame?
Wes: need to know how it's being used outside WMF
before we can assess full prod usage
e.g. do we build a template, etc.
Lila: understand there is complexity in detail, but need to know when it's production ready
need a quarter to feature this out?
Wes: probably yes
Lila: OK
Mark: (Ops) needs preparation time
Lila: this is a critical feature. everything on WP should have a map
Terry: already seeing increase in engineering productivity, but still should get better at notifying people who will need to be involved downstream
Wes: agree, already understanding usage inside and outside much better
Terry: don't need to think way ahead before we can innovate (fill out 15 pages of capital requests ... ;)
need to be able to experiment
Tomasz: agree, I think we hit this many times historically

[slide 5]

Dan: search satisfaction KPI
metric is now at 50% (?) which seems very low
Lila: dashboarding is awesome, want this for everything
but how is satisfaction defined?
Dan: first hypothesis was clicks on results
but need to include bounce rate
should still validate this by asking people if they found what they want
Lila: so what's [the definition now]
Dan: click through at 50% and bounce rate
Goal for this q?
Dan: validate this metric
Lila: fine, but also need goal for the metric itself
Wes: if we don't actually understand how it works, we won't succeed
took longer than we thought
Lila: I want one goal - improve our search results ;)
looking at external solutions too? e.g. don't think we need to build own typeahead feature
Dan: that's built into Elasticsearch
--> Lila: Can you link to your goals and other stuff from the dashboard? becoming a central place
--> Lila: can we get performance into KPI?

[slide 6]

(Dan:) other successes and misses
Wes: we publicize test before we run them
Lila: can other teams like Readership use this A/B infrastructure?
Dan: this in particular is built into search, but yes in principle
--> Lila: Wes, can you sync up with Gabriel and Toby on this

[slide 7]

Dan: core workflows
dynamic scripting: security issues, been working on this for ages, had to change extensions that relied on this (e. Translate extension)
Terry how much effort now in improving ElasticSearch?
Dan: maybe 20-30% of the four engineers who work on search

[slide 8]

Dan: core workflows - wikipedia.org
proper code review for that site
at Wikimania talked to the community member who is most involved in maintaining this
Lila: plans for this next q?
Moiz: first runs some small A/B tests
e.g. improve typeahead, change size of search box
then move all the knowledge engine stuff there
--> Lila: make small but frequent changes, get used to that
Moiz: yes, we intend to perhaps have 10-20 week-long tests
Dan: Moiz did a lot of mockups for this, both on small changes and more strategic things
run surveys
did not succeed in adding Eventlogging to it

[slide 9]

--> Lila: need evaluation of how index is going to be run - e.g. will Wikidata be indexed itself, or serve as index
that decision will need to be taken on engineering level
e.g. "fork" of Wikidata?
--> Lila: get the goals on (satisfaction metric) in two weeks from now
Lila: how much of zero results are result of article missing? (on a that wiki)
so we can see how much we could seed with automatic translation
Wes: would be very curious about that too
Tomasz: many search English site for things available in other langs