Grants:Project/Wikidata Impact: mapping records quality and user experience
Project idea
editWhat is the problem you're trying to solve?
editWhat problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.
Wikidata is a relatively young project though with a tremendous potential to empower the Wikimedia projects, knowledge, and community. There is increasing adoption of Wikidata in the GLAM sector and some initiatives such as Sum of all paintings, or Art + Feminism, have significantly increase the number and quality of GLAM items in the project. Moreover, as GLAMs are opening up their collections they see the opportunities to include their data on Wikidata. The clear advantages of adding museum data to Wikidata include tapping into the active community that will continue enhancing the records, automated translations into multiple languages, integration with other Wikimedia projects, linkage to other data records, and access to the search query and visualization tools available. Another advantage is that adding GLAM data would improve the positioning of the objects in the global network of information available to search engines, voice assistants, and any other potential usages. However, there is limited adoption in some countries and in some GLAM, especially due to the resources of small organizations. The challenge is two-fold. On the one hand, there is a lack of knowledge and skills to use Wikidata. We believe this is because people do not know that Wikidata serves to interlink content across languages, to connect with external datasets, and -perhaps most importantly- to facilitate access to all the desired knowledge. On the other hand, there is currently limited research to demonstrate the impact of this work which makes it difficult for GLAM institutions to justify the investment of resources to undertake this work. The existing Wikidata analytics tools (e.g. Wikidata Usage and Coverage in WMF Project, are limited to visually present the GLAM data in an aggregated or individual format. Moreover, there are potential usages of the Wikidata records that are not being tracked.
The main problem would like to solve with this project is to understand the current usage of Wikidata in GLAMs, both in terms of number of items and quality of the data input for those items, and measure the impact of this work. There is limited work that aggregates the data of the current content of all GLAM institutions around the world in Wikidata and measures the impact on the usage of this data.
What is your solution to this problem?
editFor the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.
We propose to examine the usage of Wikidata by GLAM institutions, mapping the number of items, the quality of the items based on the properties entered, the usage of this data on other Wiki projects, and the impact on article views. The results of this research will form a toolkit for GLAMs and content contributors with recommendations on how to increase and measure the impact of Wikidata.
Project goals
editWhat are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.
The project goals are the following:
- Goal 1: To map the current usage of Wikidata by GLAM institutions: to benefit the Movement to improve coordination with GLAM stakeholders, helping identify geographic areas with insufficient representation.
- Goal 2: To quantify the visibility of GLAMs in Wikipedia: to benefit stakeholders, when requesting funds from their management to participate in Wikimedia Foundation projects. This will also benefit Wikimedians to establish a metric that can be used to evaluate the impact of Wikidata.
- Goal 3: To identify the key properties of Wikidata: to benefit all members of the movement, by supporting decision-making to increase visibility of diverse content (e.g. content from the global south, associated to gender), eventually improving user experience. This will also benefit GLAM stakeholders to identify the impact of Wikidata to increase their interest in the project. This may also be beneficial for Wikidata supporters when considering elements to evaluate, or to design tools to automate (e.g. evaluation tools of all Wikidata content containing the identified Key properties).
- Goal 4: To document the impact of Wikidata: to benefit all editors (by reducing double work and by facilitating the dissemination of contributed content). To benefit future potential users / editors who may be new to the Wikidata project.
- Goal 5: To identify the impact assessment needs and discuss the current limitations of the analytics tools: to benefit the Wikimedia Foundations to create future tools for the GLAM sector.
Project impact
editHow will you know if you have met your goals?
editFor each of your goals, we’d like you to answer the following questions:
- During your project, what will you do to achieve this goal? (These are your outputs.)
- Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)
For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.
Goal 1: To map the current usage of Wikidata by GLAM institutions:
editOutput: A table with all GLAMs present in Wikidata, by geographic location, with ranking by use of properties. Special attention will be given to key properties, related to sector identifiers (e.g. IIIF, harmonized naming conventions).
Method: We will identify institutions with unique ID from GLAMs institutions and identify their content, probably isolate a selection of objects (e.g. paintings, prints). After this step we will look at the number of Wikipedia pages each item is used and the languages of these pages.
Outcome: The Movement will have a list of the geographic areas not represented, as well as the type of content not represented in Wikidata, to guide strategic work. The Wikidata project will get an overview of the most used properties, as well as the properties with more impact, to support their communication with future contributors.
Goal 2: To quantify the visibility of GLAMs in Wikipedia:
editOutput: A summative statistical analysis of the article pageviews to understand the distribution of the article views. This analysis will include a table ranking the most viewed objects from GLAMs, based on those present in Wikidata, in all (or selected) languages of Wikipedia. We will decide how many languages to review depending on the size of the dataset.
Method: We will gather the number of views per Wikipedia page, in various languages, of our GLAM content dataset. Depending on the size, we may select a few languages (we hope to do them all!). The tools to undertake this task may include the Wikimeida API and the Pageviews tool.
Outcome: This ranking will serve future evaluations to compare changes in visibility of content, spread of content across languages, or other future questions. The ranking can further be used to argue for the key role of Wikipedia (and Wikimedia as backbone structure) for the global knowledge infrastructure.
Goal 3: To identify key properties of Wikidata for GLAM object records:
editOutput: A list of most used Wikidata properties with ‘rate of impact’ based on the (i) most usage in Wikipedia projects, and (ii) most views in Wikipedia projects.
Method: We will link the reuse of content in different languages and in different Wikipedia pages based on the usage of Wikidata properties. We will analyze the potential correlation between the quality of Wikidata items (based on the Objective Revision Evaluation Services - ORES) and the visibility of the item. As part of this analysis, we will examine wether linking images to Wikidata items have any impact on their usage.
Outcome: Our results may inform the RECOIN team, to confirm their system is working, or to suggest new elements to consider. Results can support the ORES team, by establishing the link between the use of GLAMs images to page views, for future comparison with other types of images. Results can also be used to document the adoption of Wikidata, to highlight the importance of certain properties, and to reflect on the general digital literacy (of GLAMs and of other contributors). Results may also be used to increase the usability of software link to Wikidata (e.g. Wikidata pointers from TMS). Future research could consider checking the degree to which GLAMs contribute all of their metadata or if they only contribute part of their metadata to Wikidata, by comparing our results to the institutional websites.
Goal 4: To document the impact of Wikidata for the GLAM sector:
editOutput: Based on the results from goals 1 though 3, we will establish the impact of Wikidata by linking the usage of Wikidata, based on the presence of key properties, to the use of content in various Wikipedia languages, based on number of views.
Method: We hope to establish the relation between quality of Wikidata usage by GLAMs (a sort of GLAM specific RECOIN) to page views in Wikipedia, in all languages (for a sort of GLAM specific ORES). This section needs to be further discussed.
Outcome: The Movement will be able to establish a clear metric for the impact of Wikidata, by number of key field entered a share increase in usage and views in various languages. Future research may want to look at the link of Wikidata to other datasets and establish if the key properties are indeed the most appropriate to connect with external datasets. We expect they will (!).
Goal 5: To identify the impact assessment needs and discuss the current limitations of the analytics tools
editOutput: Impact assessment framework with a list of metrics that will help GLAMs to report the impact of their work on Wikidata.
Method: Based on the current usage of Wikidata and discussions with the advisory board and GLAMs professionals we will identify the key needs to demonstrate the impact of Wikidata. We plan to identify the internal usage of wikidata on other wiki projects (e.g. on Wikipedia) but also on external platforms that impact the user experience (e.g. Google knowledge graph, voice assistants, GLAM websites and apps)
Outcome: Increase the understanding of what can be tracked on the usage of Wikidata content, what the current tracking limitations and based on the findings, we will provide recommendations to improve analytics tools.
Do you have any goals around participation or content?
editAre any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.
We will map all the GLAM profiles in Wikidata. The rest of the data gathered and analysed will depend on the magnitude of the GLAM profiles. We expect to find more than 10,000 images from more than 800 GLAMs (based on previous research).
Project plan
editActivities
editTell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?
We envision the following activities in a period of one year (12 months): (we mention the method per goal in the section above)
Phase 1: Discovery phase (Nov - Dec 2021) = We will explore the current tools available and the current projects in the Wikidata and GLAMs community. This is the preliminary research to inform our data gathering strategy and analysis. We will be consulting with members of the movement. We will do this with two researchers 1.5 days a week.
Phase 2: Data collection and analysis (Jan - June 2022) = Based on the preliminary research, we will gather data from Wikidata, and from Wikipedia using the various tools available and in communication with the Wikidata team (particularly Sandra Fauconnier). We will analyse the result and explore conclusions.
We will do this with 2 senior researchers and 2 assistant researchers ( 2days/wk)
Phase 3: Toolkit development and communication (July - Oct 2022) = We will develop a toolkit for the GLAMs. We may consider a few case studies to inform the feasibility of the toolkit. We will further disseminate the toolkit through selected meetings (e.g. Wikimedia, academic, GLAM related). We will draft a final report to increase usability of findings and support of future research.
We will meet with members of our advisory board on a (bi)monthly basis.
Budget
editHow you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!
30,000 EUR = 36,000 USD Senior Researcher, Erasmus University Rotterdam
10,000 EUR = 11,000 USD MA Research Assistant, Erasmus University Rotterdam
14,000 USD = Senior Researcher, Pratt Institute (course release Spring 2022)
3,840 USD = Graduate Assistant Pratt Institute (spring and summer 2022) Set rate at 10 hours a week, 24 weeks, $16 per hour
5,000 EUR = 6,000 USD = Toolkit design and dissemination
TOTAL = 70,840 USD
Budget rationale:
The Senior and Assistant Researcher fees are established by the institutions (Erasmus University Rotterdam and Pratt Institute). These are calculated by the supporting grants office based on the hourly work expected.
The Toolkit design and dissemination will be produced by a design firm.
The EUR = USD rate was calculated as of March 2021.
The tasks for the research team is as follows:
Senior Researchers
- Coordinate research project and organize bi-monthly meetings and communication with the advisory board
- Identify and recruit GLAM professionals to input during the development of the impact assessment framework and toolkit
- Define the project scope and identify the data to be collected from Wikidata and Wikimedia
- Undertake the data collection and analysis
- Hire and manage research assistants
- Document data analysis process, produce toolkit content and research publications to share the results with the Wikimedia and GLAM communities
Research assistants
- Conduct a review of relevant literature and tools
- Support data collection and analysis
- Administration support during the project (e.g. minute-taking)
- Help with the dissemination of the toolkit and research results
Community engagement
editHow will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.
We aim to engage with GLAM professionals for feedback during the development of the toolkit and its dissemination.
The following individuals have agreed to be part of our advisory board, to support our project, help us to identify community needs and attend a meeting on a (bi)monthly basis.
- Fiona Romeo, Senior Program Manager, GLAM and culture.
- Andrew Lih, Wikimedian at Smithsonian and Wikimedia Strategist at The Metropolitan Museum of Art.
- Silvia Gutierrez, Volunteer, Wikimedia Foundation; Digital Humanities Librarian, El colegio de México.
- Megan Wacha, President Wikimedia New York. Scholarly Communications Librarian at City University of New York.
- Lucy Patterson, project manager, Wikimedia Deutschland.
- Georgina Burnett, relationship manager for data partnerships on Wikidata and Wikibase, Wikimedia Deutschland.
- João Alexandre Peschanski, Coordinator, Wiki Movimento Brasil. Professor, Faculdade Cásper Líbero
- Érica Azzellini, Communications Manager, Wiki Movimento Brasil
We will communicate and disseminate results via several channels:
- National Wikimedia chapters, including the Netherlands, USA, Italy, and the Iberoamericas.
- GLAM-Wiki community, via the monthly newsletter.
- Wikimedia-related meetings, including annual meetings, GLAM meetings, and academic meetings.
- GLAM conferences (e.g. Museum Computer Network)
- Workshops and activities in the programs at the School of Information, Pratt Institute (MS. Museums and Digital Culture, MS. Library Information Science) and Erasmus Rotterdam University
Get involved
editParticipants
editPlease use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.
Trilce Navarrete is a Lecturer at the Erasmus University Rotterdam, School of History, Culture and Communication. She has been a member of the Wikimedia Nederland since 2013. Trilce has spoken in various Wikimedia conferences organized by the GLAM WIKI, Dutch and Belgium chapters. She is active in disseminating the value of open data and presents the Wikimedia Foundation as the best practice example. As editor, Trilce has done her 10th edit (in 2019!), but is active in promoting the contribution value of Wikimedia Foundation with her students. In 2019, a group of students conducted a project about Wikipedia, in collaboration with Wikimedia Nederland.
Together, Trilce and Elena presented some of their work at the GLAM March 2021 office hours.
Elena Villaespesa is an Assistant Professor, School of Information, Pratt Libraries. She conducts research and teaches courses in Digital Analytics, User Research and Digital Strategy applied to the GLAM sector. Elena also works as a Digital Analyst at The Metropolitan Museum of Art. As part of this work she has participated in various wiki-editathons and produced regular reporting of the impact of the Open Access initiatives at the museum. Some examples of the data published related to The Met on Wikipedia can be found on these two blog posts (https://www.metmuseum.org/blogs/now-at-the-met/2018/open-access-at-the-met-year-one, https://www.metmuseum.org/blogs/collection-insights/2018/open-access-images-spanish-wikipedia)
Applicants have worked independently and together researching the usage of museum paintings on Wikipedia. With the aim of advocating for museums to open up their collections, we analyzed in detail the collaborations between museums and Wikimedia platforms, discussed the limitations of the current model, and provided evidence of how these initiatives have increased the visibility of museum collections. This research led to the following publications:
Navarrete, T. and Villaespesa, E. (2020) Digital Heritage Consumption: The Case of the Metropolitan Museum of Art. magazén, 1:2. Edizioni Ca’ Foscari - Digital Publishing DOI: http://doi.org/10.30687/mag/2724-3923/2020/02/004
Navarrete, T. and Villaespesa, E. (2020), Image-based information: paintings in Wikipedia, Journal of Documentation, Vol. 77 No. 2, pp. 359-380, DOI: https://doi.org/10.1108/JD-03-2020-0044
Villaespesa, E., and Navarrete, T. (2019). Museum Collections on Wikipedia: Opening Up to Open Access Initiatives. Museums and the Web 2019, Boston. https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-open-data-initiatives/
This research work has been presented at the Museums and the Web conference 2019, Wikipedia Day 2019, and GLAM Office hours 2021.
Navarrete, T. and Borowiecki, K.J. (2016) “Change in access to heritage after digitization: ethnographic collections in Wikipedia.” Cultural Trends. 2016, 25(4):233-248.
Navarrete, T. and Borowiecki, K.J. “The Long-tail of Museum Collections: Ethnographic collections onsite and online”. UNESCO papers. 2016. Proceedings available here. Slides available here.
Navarrete, T. “Adoption of computers in Dutch Museums: interpreting the new tool.” Tijdschrijft voor Mediageschiedenis. 2015, 18(1):101-116.
Applicants’ knowledge of analytics tools, work experience and connections in the GLAM sector, and previous research on the usage of Wikimedia tools by museums will highly contribute to the execution and dissemination of the proposed research.
Community notification
editYou are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?
Endorsements
editDo you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).