Movement Insights/Movement metrics process

This is how to produce the monthly movement metrics report.

Wait for dependencies

edit

Normally, you only have to wait for these dependencies to arrive. However, sometimes failures happen which means you find yourself blocked waiting for one of these. In that case, you'll need to contact the responsible people and ask them to fix the problem.

Most of these dependencies are produced by Airflow jobs. To check their status, follow the instructions at wikitech:Data Engineering/Systems/Airflow/Instances#Access.

Datasets owned by other teams

edit
dataset expected arrival (day of the month) Airflow instance Airflow job name notes
mediawiki_​history day 3-5 analytics mediawiki_​history_​denormalize We receive an email alert when it is done (T357472)
editors_​daily analytics editors_daily_monthly
pageview_​hourly analytics pageview_hourly
virtualpageview_​hourly analytics virtualpageview_hourly
net new pages API day 5-10 (to check, see if contributor and content data for the new month is available on Wikistats.) The update process is currently manual, and often gets delayed or forgotten. The Data Products team is working on automating it (T355536)
wmf_​readership.unique_​devices_​per_​project_​family_​monthly analytics unique_​devices_​per_​project_​family_​monthly
mediawiki_​wikitext_​history day 10-12 analytics mediawiki_​wikitext_​history Used to generate research.article_​features and research.article_​quality_​scores
research.article_​features, research.article_​quality_​scores day 11-13 research research_article_quality Used to generate content gaps data
content_​gap​_metrics.by_​category​_all_​wikis day 11-13 research knowledge_gaps The notebooks can be safely re-run to incorporate these without affecting previously generated metrics

Datasets owned by Movement Insights

edit

Our movement_metrics job, which is scheduled to run on day 7 of the month, generates the following intermediate datasets.

  • wmf_product.active_editors
  • wmf_product.content_interactions
  • wmf_product.global_markets_pageviews
  • wmf_product.editor_month
  • wmf_product.new_editors
  • wmf_product.pageviews_corrected

Run the notebooks

edit

Run the movement-metrics notebooks using the instructions in the readme. Once you push the updates to Gitlab, our metric spreadsheet will automatically pick up the newly calculated values.

To push the updates to GitLab, change your working directory to the cloned 'movement-metrics' folder and execute the following git commands:

git add .
git commit -m "Update [Month] [Year] metrics"
git push

To authenticate your push to the repository will need to supply your GitLab username and your generated GitLab access token.

edit
  1. Assess reports and investigate noteworthy trends
  2. Copy the key graphs to slides in the Prep - Movement Metrics deck
  3. Draft summary message in the Summary draft - Movement Metrics doc.
  4. Have Omari and other team members review it

Distribute the report

edit
  1. Move finished slides to the Movement Metrics deck.
  2. Share the update in the #insights-and-data channel on Slack
  3. Publish slides.
    1. Upload to Commons. Source: your own work. Author: Wikimedia Foundation Movement Insights Team. Copyright template: {{WMF-staff-upload}}
    2. Replace the previous report on Research and Decision Science with the new one.
    3. Add the previous report to Research and Decision Science/Movement Metrics.