Movement Insights/Movement metrics process
This is how to produce the monthly movement metrics report.
Wait for dependencies
editNormally, you only have to wait for these dependencies to arrive. However, sometimes failures happen which means you find yourself blocked waiting for one of these. In that case, you'll need to contact the responsible people and ask them to fix the problem.
Most of these dependencies are produced by Airflow jobs. To check their status, follow the instructions at wikitech:Data Engineering/Systems/Airflow/Instances#Access.
Datasets owned by other teams
editdataset | expected arrival (day of the month) | Airflow instance | Airflow job name | notes |
---|---|---|---|---|
mediawiki_history | day 3-5 | analytics | mediawiki_history_denormalize | We receive an email alert when it is done (T357472) |
editors_daily | analytics | editors_daily_monthly | ||
pageview_hourly | analytics | pageview_hourly | ||
virtualpageview_hourly | analytics | virtualpageview_hourly | ||
net new pages API | day 5-10 (to check, see if contributor and content data for the new month is available on Wikistats.) | The update process is currently manual, and often gets delayed or forgotten. The Data Products team is working on automating it (T355536) | ||
wmf_readership.unique_devices_per_project_family_monthly | analytics | unique_devices_per_project_family_monthly | ||
mediawiki_wikitext_history | day 10-12 | analytics | mediawiki_wikitext_history | Used to generate research.article_features and research.article_quality_scores |
research.article_features, research.article_quality_scores | day 11-13 | research | research_article_quality | Used to generate content gaps data |
content_gap_metrics.by_category_all_wikis | day 11-13 | research | knowledge_gaps | The notebooks can be safely re-run to incorporate these without affecting previously generated metrics |
Datasets owned by Movement Insights
editOur movement_metrics job, which is scheduled to run on day 7 of the month, generates the following intermediate datasets.
- wmf_product.active_editors
- wmf_product.content_interactions
- wmf_product.global_markets_pageviews
- wmf_product.editor_month
- wmf_product.new_editors
- wmf_product.pageviews_corrected
Run the notebooks
editRun the movement-metrics notebooks using the instructions in the readme. Once you push the updates to Gitlab, our metric spreadsheet will automatically pick up the newly calculated values.
To push the updates to GitLab, change your working directory to the cloned 'movement-metrics' folder and execute the following git commands:
git add .
git commit -m "Update [Month] [Year] metrics"
git push
To authenticate your push to the repository will need to supply your GitLab username and your generated GitLab access token.
Analyze the trends and prepare the report
edit- Assess reports and investigate noteworthy trends
- Copy the key graphs to slides in the Prep - Movement Metrics deck
- Draft summary message in the Summary draft - Movement Metrics doc.
- Have Omari and other team members review it
Distribute the report
edit- Move finished slides to the Movement Metrics deck.
- Share the update in the #insights-and-data channel on Slack
- Publish slides.
- Upload to Commons. Source: your own work. Author: Wikimedia Foundation Movement Insights Team. Copyright template: {{WMF-staff-upload}}
- Replace the previous report on Research and Decision Science with the new one.
- Add the previous report to Research and Decision Science/Movement Metrics.