WikiContrib/Proposed implementation
Background information
editWikimedia has numerous tools to gather mw:Development statistics, one of them being Bitergia's analytics tool. This tool provides useful information and is convenient for community managers who are familiar with its every know-how. However, this tool is cumbersome to use for others as it requires too many steps to obtain statistics for a topic and there is a bit of learning curve to get comfortable with the tool. For example event's scholarship committee reviewers who need developer contribution statistics while reviewing applications for Wikimedia events juggle between different platforms like Github, Gerrit, Phabricator to view developer activity for deciding on an applicant.
Proposed workflow
editThe WikiContrib tool is currently a work in progress and aims to give a sneak peek into a developer’s contributions on Wikimedia platforms: Gerrit, Phabricator, and Github. The event organizers can log in to the app and perform three steps to retrieve the results.
- Organizers type in a list of users with their Wikitech/Gerrit, MediaWiki/Phabricator, and Github usernames or provide the same data in a CSV format.
- Organizers choose to filter the data by timestamp, the status of the commit (merged, abandoned, declined), and project name.
- Displays data in a tabular format with the ability to sort the data by username or activity.
Any user will be able to use this tool, but the event organizers will have slightly more advantage and will be able to authenticate and gain access to additional features such as search history, uploading the data in CSV format, etc. As of now, the plan is to allow one of the authenticated users to validate the newly registered user.
The idea for the WikiContrib tool is inspired from AWMD stats that generates monthly statistics of technical contributors to Wikimedia projects from Africa.
This tool will be available for use on Toolforge.
Note: The project was called as Contraband during the development phase. After the Initial development phase, the title of the project is updated to WikiContrib
Mockups & wireframes
editMockups
editWireframes
editTechnical implementation
editFetch Gerrit and Phabricator contributions of a user
editFor Gerrit contributions, all changesets, new, merged, and abandoned will be considered. For Phabricator, all tasks authored by and assigned to a user will be considered. The visualization will look as shown in mockup a).
[WIP Section] The solutions to both the above questions is not proposed by me. It is the way how wikimedia.bitergia.io gets the statistics. I am just following it’s method.
For retrieving contributions, here are some identified solutions:
ElasticSearch
editRequest payload |
---|
{
"aggs":{
"2":{
"terms":{
"field":"status",
"order":{
"_count":"desc"
}
}
}
},
"query":{
"query_string":{
"query": "*Rammanojpotla"
}
}
}
|
Phabricator API
editRequest URL: https://phabricator.wikimedia.org/conduit/method/maniphest.search
Request payload |
---|
{
"constraints": {
"authors": [
"PHID-USER-utkozuokiv4qi3otfgny"
]
}
}
|
Notes:
- It is currently only possible to fetch "authored and assigned" and not "authored or assigned." For development purpose, only authored count will be considered. For production use, an API request to fetch both authored and assigned will be performed separately, and then their responses will be merged.
- The output of the above API search will be paginated for which API requests need to be continued till all the results have been fetched.
Gerrit API
editRequest URL: https://gerrit.wikimedia.org/r/changes/?q=owner:rammanoj&o=DETAILED_ACCOUNTS
Request Payload: None
Response |
---|
[
{
"id": "mediawiki%2Fextensions%2FParserFunctions~master~I5695a4cce0bfc92a047e611353c10640a299d2f0",
"project": "mediawiki/extensions/ParserFunctions",
"branch": "master",
"topic": "point",
"hashtags": [],
"change_id": "I5695a4cce0bfc92a047e611353c10640a299d2f0",
"subject": "Fix incorrect handling of strings with multiple decimal points",
"status": "ABANDONED",
"created": "2018-01-28 10:10:08.000000000",
"updated": "2018-06-12 18:03:59.000000000",
"insertions": 31,
"deletions": 2,
"unresolved_comment_count": 0,
"has_review_started": true,
"_number": 406485,
"owner": {
"_account_id": 4632,
"name": "Rammanojpotla",
"email": "rammanojpotla1608@gmail.com",
"username": "rammanoj"
}
},
{
"id": "mediawiki%2Fextensions%2FParserFunctions~master~Ida573d94d0df8862f3189bb9e9735decaa12eecf",
"project": "mediawiki/extensions/ParserFunctions",
"branch": "master",
"topic": "complex",
"hashtags": [],
"change_id": "Ida573d94d0df8862f3189bb9e9735decaa12eecf",
"subject": "Fix give errors on using complex number in {{#expr:}}",
"status": "ABANDONED",
"created": "2018-01-27 05:25:44.000000000",
"updated": "2018-06-12 17:59:18.000000000",
"insertions": 11,
"deletions": 3,
"unresolved_comment_count": 0,
"has_review_started": true,
"_number": 406391,
"owner": {
"_account_id": 4632,
"name": "Rammanojpotla",
"email": "rammanojpotla1608@gmail.com",
"username": "rammanoj"
}
}
... .... ....
]
|
Note: All the above objects need to be added to get a count of all contributions.
Retrieve user contributions for different dates and times
editThe visualization will look as shown in mockup b).
ElasticSearch
Request URL for fetching contributions from Gerrit: GET gerrit/_search?filter_path=took,hits.total,aggregations
Request payload |
---|
{
"aggs":{
"2":{
"date_histogram":{
"field":"grimoire_creation_date",
"interval":"1D",
"time_zone":"Asia/Kolkata",
"min_doc_count":1
},
"aggs":{
"3":{
"terms":{
"field":"status",
"size":4,
"order":{
"_count":"desc"
}
}
}
}
}
},
"query": {
"query_string": {
"query": "*Rammanojpotla"
}
}
}
|
Response |
---|
{
"took":468,
"hits":{
"total":198
},
"aggregations":{
"2":{
"buckets":[
{
"3":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"MERGED",
"doc_count":1
}
]
},
"key_as_string":"2017-03-27T00:00:00.000+05:30",
"key":1490553000000,
"doc_count":2
},
{
"3":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
]
},
"key_as_string":"2017-03-28T00:00:00.000+05:30",
"key":1490639400000,
"doc_count":3
},
{
"3":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"MERGED",
"doc_count":2
}
]
},
"key_as_string":"2017-04-14T00:00:00.000+05:30",
"key":1492108200000,
"doc_count":13
}
]
}
}
}
|
Request URL for Phabricator: GET maniphest/_search?filter_path=took,hits.total,aggregations. For Phabricator and Gerrit, same API Calls will be performed via the following APIs:
- maniphest.search for phabricator.
- https://gerrit.wikimedia.org/r/changes/ for gerrit
Display user activity in a tabular format
editThe visualization will look as shown in mockup e).
ElasticSearch
Request URL: GET gerrit/_search
Request payload |
---|
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"*rammanoj"
}
},
{
"match_phrase":{
"created_on":{
"query":"2017-04-14"
}
}
}
]
}
}
}
|
Response |
---|
{
"took": 396,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2,
"hits": [
{
"_index": "gerrit_wikimedia_180406b_enriched_190527",
"_type": "items",
"_id": "383a235373831a130a06686ddca0a0a28306491a_changeset_348225",
"_score": 2,
"_source": {
"changeset_author_org_name": "Independent",
"author_name": "Rammanoj",
"timeopen": "21.00",
"grimoire_creation_date": "2017-04-14T14:21:12+00:00",
"patchsets": 5,
"closed": "2017-05-05T14:24:07+00:00",
"owner_bot": false,
"owner_uuid": "1a76fb77b4bd3fcbda44de685cf4a0739dfe37fe",
"changeset_author_uuid": "1a76fb77b4bd3fcbda44de685cf4a0739dfe37fe",
"type": "changeset",
"demography_min_date": "2017-03-27T18:02:55.000Z",
"name": "Rammanojpotla",
"author_bot": false,
"changeset_author_gender": "Unknown",
"project": "Wikimedia",
"githash": "Icc487bc6932027e4652dc24743c664c245e0222b",
"owner_domain": "gmail.com",
"opened": "2017-04-14T14:21:12+00:00",
"owner_id": "bdc986f25cd05e52add2651b214c3f7a22ac5d3a",
"last_updated": "2017-05-05T14:24:07+00:00",
"metadata__filter_raw": null,
"cm_title": "wikimedia",
"author_id": "bdc986f25cd05e52add2651b214c3f7a22ac5d3a",
"changeset_author_user_name": "",
"is_gerrit_review": 1,
"owner_org_name": "Independent",
"author_gender_acc": 0,
"author_user_name": "",
"author_uuid": "1a76fb77b4bd3fcbda44de685cf4a0739dfe37fe",
"origin": "gerrit.wikimedia.org",
"metadata__timestamp": "2017-05-05T14:26:51.023552+00:00",
"changeset_author_id": "bdc986f25cd05e52add2651b214c3f7a22ac5d3a",
"uuid": "383a235373831a130a06686ddca0a0a28306491a",
"changeset_author_bot": false,
"summary_analyzed": "Ruby gem documentation should state license",
"created_on": "2017-04-14T14:21:12+00:00",
"owner_gender_acc": 0,
"changeset_author_gender_acc": 0,
"is_gerrit_changeset": 1,
"changeset_author_domain": "gmail.com",
"domain": "gmail.com",
"url": "https://gerrit.wikimedia.org/r/348225",
"changeset_author_name": "Rammanoj",
"summary": "Ruby gem documentation should state license",
"status": "MERGED",
"changeset_number": "348225",
"metadata__enriched_on": "2019-05-28T03:58:21.723693+00:00",
"offset": null,
"metadata__gelk_backend_name": "GerritEnrich",
"demography_max_date": "2019-03-19T05:29:56.000Z",
"author_gender": "Unknown",
"metadata__gelk_version": "0.54.0",
"owner_gender": "Unknown",
"author_org_name": "Independent",
"repository": "mediawiki/ruby/api",
"owner_user_name": "rammanoj",
"owner_name": "Rammanoj",
"metadata__updated_on": "2017-05-05T14:24:07+00:00",
"project_1": "Wikimedia",
"tag": "gerrit.wikimedia.org",
"branch": "master",
"id": "383a235373831a130a06686ddca0a0a28306491a_changeset_348225",
"author_domain": "gmail.com"
}
}
...
]
}
}
|
Request URL for Phabricator: GET maniphest/_search?filter_path=took,hits.total,aggregations. For Phabricator and Gerrit, same API Calls will be performed via the following APIs:
- maniphest.search for phabricator.
- https://gerrit.wikimedia.org/r/changes/ for gerrit
Benefits of using ElasticSearch over Gerrit and Phabricator APIs
editWith Phabricator API, there is a limit of 100 objects per call that cannot be performed in a parallel manner. With Bitergia contribution count can be easily fetched, the response is retrieved much quickly and that too with a single API request. This would also allow displaying data to the user in realtime.