Wikilytics Plugins
Wikilytics consists of two parts:
- The data chain that downloads, extracts, and stores a Wikipedia dump file into a database.
- The dataset functionality that runs a query against the dataset built in phase 1.
This page documents the inner workings of the dataset functionality, and in particular how to use plugins and how to write your own.
Running a plugin
editThe generic way to start a plugin is as follows:
python manage.py dataset -c name_of_plugin -k keyword1=value1,keyword2=value2
You can specify the granularity of the observations. The default is to aggregate observations to a year, for example the number of new editors in a given year. But you also do this:
python manage.py dataset -c new_editor_count -k time_unit=month
or even
python manage.py dataset -c new_editor_count -k time_unit=day
.
In the first case, you will break down the number of new editors to a monthly level and in the second case you can even get counts at a daily level. This option applies to all Wikilytics plugins.
Most plugins do not need the -k option, but it will give you an additional level of control over the output of the plugin.
Generic Plugins
editGeneric plugins are plugins that answer high-level trends of a Wikipedia project.
Plugin name | Plugin description |
---|---|
new_editor_count | This is the default plugin that will run if you do not explicitly call for another plugin. This plugin counts the number of New Wikipedians for every year / month combination since the start of the project being analyzed. |
time_to_new_wikipedian | This plugin calculates for each new wikipedian, how many days it took to become a new wikipedian, (a new wikipedian is generally defined as someone who has made 10 edits). |
active_editor_count | You can invoke this plugin as follows: python manage.py dataset -c active_editor_count -k time_unit=month,cutoff=5. This will count for every year/month combination the number of editors who made at least 5 edits in that given month/year. |
histogram_edits | This plugin is used to create dataset that can be visualized as a histogram. You can invoke this plugin by entering: python manage.py dataset -c histogram_edits -k time_unit=month,namespace=1,2. This will create a csv file that will count for each namespace/month/year combination in the frequency of number of edits. |
total_cumulative_edits | You can invoke this plugins as follows: python manage.py dataset -c total_cumulative_edits -k namespace=1;2;3;4,time_unit=month. This will count the number of edits for a given namespace/month/year combination. This data can then be used to create a line chart. The namespace keyword is optional, if you do not specify it then the main namespace is assumed. |
total_number_of_articles | Does not work yet. |
total_number_new_wikipedians | This plugin counts the number of new wikipedians in a given time unit (choices are year, month and day). A New Wikipedian is a person who made at least 10 edits. There are no other optional arguments for this plugin. |
Editor Trends Study Plugins
editPlugin name | Plugin description |
---|---|
ets_cohort_backward_bar | To be added |
ets_cohort_backward_histogram | To be added |
ets_cohort_forward_bar | To be added |
ets_cohort_forward_histogram | To be added |
Taxonomy Plugins
edit- More at Contribution Taxonomy Project
Plugin name | Plugin description |
---|---|
taxonomy_burnout | To be added |
taxonomy_list_makers | To be added |
Plugins in Development
editPlugin name | Plugin description |
---|---|
edit_patterns | The purpose of this plugin is to identify the most common editing patterns of Wikimedians. An editing pattern shows the monthly sequence of activity and inactivity. The output consists by editor / by year of raw with True and False values. True indicates that the editor made more than cutoff edits and False means the editor did not reach the cutoff value. |
C | D |