Grants:Project/Rapid/Theredproject/WikidataQuickSheets

statusfunded
Theredproject/Wikidata QuickSheets
Develop software that will semi-automate the process of moving data from Wikipedia to Wikidata, with reliable sources.
targetWikidata, *.Wikipedia
start dateMay 1
end dateAugust 31
budget (local currency)1995
budget (USD)1995
grant typeindividual
granteeTheredproject


Review your report

Project Goal

edit

Briefly explain what are you trying to accomplish with this project, or what do you expect will change as a result of this grant. Example goals include, "recruit new editors", "add high quality content", or "train existing editors on a specific skill".

Building on work done as part the Art+Feminism campaign, we will further develop software that will semi-automate the process of moving data from Wikipedia to Wikidata. We have designed this tool to be accessible to those without programming experience. It uses simple article lists to generate spreadsheets for human evaluation. These sheets will then be transformed back into QuickStatements ready data. The script requires no special libraries or dependencies, beyond what is available by default in basic python configurations. In this regard it is similar to the Wikidata game, but differs in three key ways:
  1. You can start with your own very focused list
  2. It does more of the work for you
  3. It is laid out in a spreadsheet format you can scan and approve the data faster and at scale.
At present the software works in a limited capacity for Art+Feminism specific goals: moving occupation (P106) data from enwiki to Wikidata, as stated in enwiki. This effort would make several key improvements that would make this a valuable tool to the larger community. The initial use case scenario is adding reliable source claims to ethnic group (P172) statements; at present there are 50,000+ statements that are at risk of deletion because they do not have reliable sources; of these the largest number are 14,000 are for African Americans (Q49085) (likely because of the Wikidata Game). For more on the context of this problem, please see here: [1]
I have written up documentation of how the current version of Wikidata QuickSheets works here: [2] It has several key limitations: it only works for P106 data, and it merely sources the claims to "stated in English Wikipedia" -- while not ideal, this will work for P106 data, but not P172 data. To work for P172, and other data that fall into BLP Privacy concerns, it will require that we be able to add a reliable source.

Project Plan

edit

Activities

edit

Tell us how you'll carry out your project. What will you and other organizers spend your time doing?

Our work will be software development and documentation. Adding sourcing capabilities requires the following improvements to the code:
  • It would need to be able to work from a list of QIDs (possibly via PagePile?)
  • It would need to pull references from the Wikipedia entry, and search for the property values in the text of the reference.
    • One challenge here would be searching for all the terms associated with a specific ethnicity; it maybe be possible to do this via Wikidata aliases
  • It would need to be abstracted so that you could configure it to accept any property input, not just P106 or P172
  • Because of the potential for complex statements, it should be migrated to output in QuickStatements2 format
We will also create clear documentation, as that is necessary for others to use a tool like this. Preferably this would include video documentation, which is something that most wiki tools lack, and produces a significant barrier to entry. We will hold one in-person training at a WMNYC meeting.

How will you let others in your community know about your project (please provide links to where relevant communities have been notified of your proposal, and to any other relevant community discussions)? Why are you targeting a specific audience?

  • We have notified several relevant WikiProjects via talk pages:
    • Women In Red [3]
    • Black Lunch Table [4]
    • AfroCrowd [5]
    • Art+Feminism [6] and [7]
    • Whose Knowledge [8]
    • Wikipedia:WikiProject Biography [9]
    • Wikidata:WikiProject Women [10]
  • We have posted on Wikidata:Project chat here: [11] and here: [12]

What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

  1. Improved software so it can add sources to Wikidata in a semi-automated process
  2. Tested said software by adding P172 data to at least 500 items
  3. Created written and video documentation on how to use the software
  4. Held an in-person training on how to use the software

Impact

edit

How will you know if the project is successful and you've met your goals? Please include the following targets and feel free to add more specific to your project:

  1. Number of total participants: 10 trained editors, via the in-person training
  2. Number of articles created or improved: at least 500 wikidata items, during testing and training period. If the software is widely used, we expect it could result in upwards of a hundred of thousand items improved.
  3. Number of photos uploaded to Wikimedia Commons: N/A
  4. Number of photos used on Wikimedia projects: N/A

Resources

edit

What resources do you have? Include information on who is the organizing the project, what they will do, and if you will receive support from anywhere else (in-kind donations or additional funding).

  • Working version of software tailored to A+F P106 needs
    • Cost of current version of software is $4000, for Danaras's labor. Funding for this comes from:
      • $1000 funded under the Webmastering line item in the Art+Feminism budget.
      • $3000 funded from a $3000 research grant from CUNY, via Michael Mandiberg.
  • Allied editors, who are invested in a similar goal, and have helped me with tricky wiki challenges, of both social and technical varieties!
  • Community partnerships with Art+Feminism, Black Lunch Table, Women in Red, AfroCrowd, WikimediaNYC, Whose Knowledge, Wikidata:WikiProject Women
  • Technical Lead and Manager: Michael Mandiberg, creator of en:Print Wikipedia
  • Python Programmer: Danara Sarioglu, current developer for en:Print Wikipedia

What resources do you need? For your funding request, list bullet points for each expense:

  • Technical Lead and Manager labor: 30 hours provided as volunteer
  • Programmer Labor: 57 hours @ $35/hr = $1995

Endorsements

edit
  • Support: This development will help to improve non-specialist editing of Wikidata which at present lacks user-friendly interfaces.--Ipigott (talk) 08:21, 5 April 2018 (UTC)
  • Strong support - This work will provide improvements to the content gender gap issues faced by Women in Red and other groups working within this scope. --Rosiestep (talk) 17:18, 27 April 2018 (UTC)
  • Support: Considering this will make things more user-friendly and further the bridge to collaboration between Wikipedia and Wikidata along with helping projects flesh out their work in a more efficient way, more impact, and nominal cost, sure I'm in. I don't see any drawbacks really.--Heathart (talk) 20:54, 27 April 2018 (UTC)
  • Support This is a modest request which will produce a practical result and a model for others. The organizer has a history of high impact high profile returns for every grant funded project which they have attempted. Unusually among wiki-grant funded project this team includes external participants to the project to a high degree. This is good for multiple projects doing outreach. It is easy for me to support this. Blue Rasberry (talk) 14:06, 28 April 2018 (UTC)
  • Support. ~ Rob13Talk 05:28, 3 June 2018 (UTC)