Grants:Project/Rapid/Abián/Study on Wikidata property constraints

statusFunded
Abián/Study on Wikidata property constraints
quantitative and qualitative analysis of how Wikidata property constraints are working and what can be done to improve their effectiveness
targetWikidata
start dateNovember, 16
end dateJanuary, 31
budget (local currency)1,350.00 EUR
budget (USD)1,473.88 USD
grant typeindividual
granteeAbián


Review your report

Project Goal

edit

Briefly explain what are you trying to accomplish with this project, or what do you expect will change as a result of this grant. Example goals include, "recruit new editors", "add high quality content", or "train existing editors on a specific skill".

Data quality is identified as a main concern by the three strategic reports on Wikidata as well as one of the keys for the success of the Wikimedia movement in the years to come. On Wikidata the most important system through which the quality of statements is controlled is Wikibase property constraints.
  • Wikibase property constraints or quality constraints, technically the MediaWiki extension WikibaseQualityConstraints, have grown reactively, rather than proactively, to comply with the suggestions and ideas of different editors and developers over the years, and today implements more than 30 constraint types and 3 severity levels, but it is not clear how all these constraints are used today on Wikidata or how developers and editors can make these constraints more effective in order to improve data quality.
  • This project aims to fill some of these knowledge gaps around property constraints (e.g., how many constraints are defined for each constraint type, when they are underused, overused or misused, how they are clustered, what correlations they hide, how and when the severity levels are used, how some constraint types can be improved, if there are too many constraint types or too few, if they meet the needs for all the data types, if users know how they work, how often the subproperties have the constraints of their parents, etc.) with useful, comprehensive, evidence-based conclusions and recommendations for both developers and editors of Wikidata with the ultimate goal of improving data quality on Wikidata.
  • This project does not aim to study constraint violations, user interfaces, EntitySchemas, the quality of the code or the software architecture of the MediaWiki extension, the Lexeme namespace or how constraints change over time.

Project Plan

edit

Activities

edit

Tell us how you'll carry out your project. What will you and other organizers spend your time doing?

  • Decide what data to gather
  • Gather and prepare Wikidata data for analysis
  • Analyze and explore Wikidata data with quantitative and qualitative methods
  • Design, prepare and distribute surveys to certain Wikidata users
  • Analyze and explore survey data and cross it with Wikidata data
  • Draw conclusions and recommendations
  • Write and upload the results as a report with a CC BY-SA license

How will you let others in your community know about your project (please provide links to where relevant communities have been notified of your proposal, and to any other relevant community discussions)? Why are you targeting a specific audience?

Feedback will be informally gathered from volunteers and staff working on Wikidata. Ádditional surveys will be distributed to certain editors of different categories so that the sampling methods are fair and the views are representative. Deciding how to distribute these surveys is part of one of the project tasks. No personal information will be collected through these surveys.
The conclusions and recommendations, along with the rest of the report, will be uploaded to Wikimedia Commons or published as wikitext directly on one or more Wikidata pages. Both the community and the staff working on Wikidata will be notified so that they can make decisions about the conclusions and recommendations of the study and improve property constraints accordingly.

What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

At the end of the project the report about the state of Wikidata property constraints will be published with a CC BY-SA license. This report will contain semi-processed data as well as high-level conclusions and recommendations to improve property constraints. Everything will be announced and linked at least from the talk page of the "WikiProject property constraints" and the project chat, and some of the content will be later used to improve the existing documentation on property constraints. I will keep in touch with the rest of editors and with the staff working on Wikidata, resolve any doubts they may have in this regard and perhaps open Phabricator tasks for the technical recommendations. Of course, I will continue contributing to Wikidata and trying to improve data quality.

Impact

edit

How will you know if the project is successful and you've met your goals? Please include the following targets and feel free to add more specific to your project:

  1. At least 12 users with different levels of experience and involvement in Wikidata have been surveyed formally throughout the development of the project.
  2. The use of at least 20 constraint types has been analyzed quantitatively.
  3. The staff working on Wikidata confirm that the results of the study let them gain a comprehensive, evidence-based understanding of how property constraints are used on Wikidata and how to improve them.
  4. At least 2 recommendations suggested by the study are applied by Wikidata editors shortly after the publication of the report.

Resources

edit

What resources do you have? Include information on who is the organizing the project, what they will do, and if you will receive support from anywhere else (in-kind donations or additional funding).

The grantee will be
Feedback will be gathered from volunteers and staff working on Wikidata. Since no additional funding or donations are expected, the completion of the project is conditioned to the approval of this Rapid Grant.

What resources do you need? For your funding request, list bullet points for each expense:

  • 11 weeks for part-time data collection and analysis, survey preparation and report writing: 1,350.00 EUR
  • Task management on Phabricator, edits on Wikidata for problem solving and other complementary tasks: 0 EUR (volunteer actions)
  • Possible contingencies: 0 EUR (assumed by the grantee)

Endorsements

edit
  • As the primary maintainer of the WikibaseQualityConstraints extension, I think this will provide very valuable information for future development of this important feature. Lucas Werkmeister (WMDE) (talk) 11:47, 24 September 2019 (UTC)
  • As someone interested in using Wikidata as a KB, I strongly endorse this proposal. I would like there to be some way to gather information on what participants think are missing from the current constraint system, perhaps by showing different kinds of constraints (including both ones that can and cannot currently be done) and asking how important participants think each one is. One comparison that might be helpful is the perceived importance of disjunctive global domains and ranges vs relative ranges. Peter F. Patel-Schneider (talk) 16:15, 24 September 2019 (UTC)
  • Data collection seems doable; not very complex, but someone would have to spent some time to collect it, evaluate it, and prepare the report. I would be interested to read it, and it seems to be worth the money they asked for… —MisterSynergy (talk) 16:38, 24 September 2019 (UTC)
  • As a Wikidata editor who spends a lot of time working with constraints, I would be very interested in participating in this research project and seeing the results. - PKM (talk) 18:53, 24 September 2019 (UTC)
  • It'd be incredibly useful for me and my team to have this analysis done. As Wikidata grows and is used in more and more places the pressure to keep its data quality high is rightfully rising. The constraints system is one crucial part of that. Since its development we've not had a chance to take a step back and understand how exactly it is currently used and where it falls short. The result of this grant would be a valuable basis for continuing to improve the constraints system. --Lydia Pintscher (WMDE) (talk) 10:47, 26 September 2019 (UTC)