Wikimedia monthly activities meetings/Quarterly reviews/Analytics/January 2015

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Analytics team, January 29, 2015, 11:00 - 11:30 PST.

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present (in the office): Leila Zia, Erik Moeller, Abbey Ripstra, Kevin Leduc, Ellery Wulczyn, Dario Taraborelli, Rob Lanphier, Garfield Byrd, Toby Negrin, Andrew Otto, Bob West, Carolynne Schloeder, Terence Gilbey, Tilman Bayer (taking minutes); participating remotely: Dan Andreescu, Aaron Halfaker, Oliver Keyes, Jonathan Morgan

Presentation slides from the meeting

Analytics Engineering

edit

Kevin:
Let's skip the "what we said" part of the slides

What we did

edit

four areas:...

 
slide 8

[slide 8]

pageviews in vital signs, broken down by the site visited - mobile/desktop
ErikM: should demo the dashboards -
https://metrics.wmflabs.org/static/public/dash --> "add metrics" --> daily pageviews
Damon: how much do we trust this?
Toby: this is legacy data, but the Analytics team stands behind data that we publish
Kevin: Labs hasn't been reliable or performant enough
need another person from Ops to help with database issues. right now it's Sean part-time, need db expert
prototype using Pentaho (open source software)
ErikM: so Pentaho uses new definition?
Kevin: yes, still researching differences
Toby: differences weren't huge
Dario: will talk about that in my part

 
slide 9

[slide 9]

Kevin: Wikimetrics for Grantmaking
Marcel joined team, worked exclusively on this

 
slide 10

[slide 10]

 
slide 11

[slide 11]

datasets for community to consume
hit bottleneck regarding how many events EvenLogging can consume, Batch Inserts removed a bottle neck
ErikM: so these are still Limn dashboards?
Kevin: yes

What we learned

edit
 
slide 13

[slide 13]

Christian has done a lot of EventLogging maintenance, but he's leaving now
Toby: yes, that was an Ops task we took on

What's next

edit
 
slide 15

[slide 15]

Kevin: unique clients - this needs community outreach first
Toby: team feels strongly this should not be done without community consultation

 
slide 17

[slide 17]

Toby: want to call out that as the Analytics team, we use numbers to evaluate our own work too ;)

 
slide 18

[slide 18]

ErikM: in case of these [EventLogging outages] we were always able to backfill though
Toby: yes, we can backfill database from log file

Research and Data

edit

Dario:

What we said/did

edit
 
slide 21

[slide 21]

focus areas
[skipping "What we said" slides]

 
slide 26

[slide 26]

this was one of the most productive quarters for Research and Data

 
slide 27

[slide 27]

in this q, turned PV definition draft into implementation
session-based metrics mostly for mobile team
then handed off to developers
this is our general pipeline
ErikM: regarding data trust:
old definition does not distinguish crawler traffic from "human" traffic, was ambiguous
e.g. presentation at December metrics meeting was based on new definition
comfortable about that
big remaining issues: unique users - we still rely on comScore for that
Dario: in summer, majority of pvs in US was from automated traffic
old def would not have caught that

 
slide 28

[slide 28]

for FR, in the beginning of q we weren't sure we could use Ellery's new tooling already
but used it successfully

 
slide 29

[slide 29]

Andrew: What's a traffic researcher?
Toby, Dario: about readers, e.g. "how many social media referrals?"

Other key accomplishments

edit
 
slide 30

[slide 30]

What's next

edit
 
slide 33

[slide 33]

revscoring: already exists, want to move from standalone to service, also used by community

 
slide 34

[slide 34]

Toby: for FR, instead of maximizing money, minimize annoyance (eg. measured by impressions by client) for given goal

 
slide 35

[slide 35]

Asks

edit

Toby: Ops issues
Damon: Labs, or Ops?
Toby: Ops including Labs
threat to stability and accuracy of our data
Labs is great environment, just need to make it more stable, or we will need to ...
Dumps are important for us, maintained by just one person, bus factor
Damon: where is search in team's work?
Toby: did one project
ErikM: get external search monitoring doing in next few weeks
search analytics goes back to general issues
Damon: my priorities:

  • need to understand users
  • make sure VE is successful
  • learn about search

Dario: for apps, we did some search analytics
Leila: will have some in ...
Andrew:...
Damon: e.g. "what types of searches are happening?"