Content Partnerships Hub/Metabase/Metabase for the global movement/de

This page is a translated version of the page Content Partnerships Hub/Metabase/Metabase for the global movement and the translation is 19% complete.

Content Partnerships Hub

Verbesserung der Arbeit der Wikimedia-Bewegung mit Content-Partnern

Metabase für das globale Movement

White Paper zu Metabase
Einführung Eine Wikibase von Grund auf auf einrichten Metabase für Chapter-Daten Metabase für das globale Movement

Hintergrund

In dieser Fallstudie berichten wir über unsere Bemühungen, die Metabase mit Beispieldaten über die von der Wikimedia-Bewegung veröffentlichten Ressourcen und Aktivitäten zu füllen, wobei wir uns besonders auf Inhaltspartnerschaften konzentrieren. Das Hauptziel besteht darin, zu bewerten, ob das strukturierte Datenformat ein geeignetes Werkzeug für die Speicherung dieser Art von Informationen ist.

Zusätzlich wollten wir die Modellstrukturen untersuchen und bewerten, die wir bei der Arbeit mit den eigenen Daten von Wikimedia Sverige entwickelt hatten. Während es ohne externe Einleitungen schwierig ist, mit Daten zu arbeiten, die von anderen erstellt werden, hofften wir, zumindest eine Vorstellung von den Herausforderungen zu bekommen.

Allgemeine Informationen darüber, wie Wikibase aufgebaut ist und warum es geschaffen wurde, ist in den Fallstudien Setting up a Wikibase from scratch und Metabase for chapter data zu finden.

Wie im Förder-Antrag von Wikimedia Sverige beschrieben, besteht eines der Ziele von Metabase darin, die Identifizierung und das Auffinden von verfügbarem Material zu Inhaltspartnerschaften (Content partnerships) für alle zu erleichtern. Seit Jahren teilen Partner und Einzelpersonen ihre Erfahrungen und Erkenntnisse in verschiedenen Formen und auf verschiedenen Plattformen, sowohl innerhalb des Wikimedia-Ökosystems als auch darüber hinaus. Es gibt eine Menge spannendes Material: Berichte, Blogposts, Newsletter, Folien, Poster und Videoaufzeichnungen. Die Stärke der GLAM-Wiki-Bewegung liegt darin, dass sie eine große Vielfalt an Erfahrungen, Fähigkeiten und Stimmen umfasst. Es gibt keine Probleme, die Arbeit in der geeignetsten Form zu teilen.

Andererseits kann es für andere schwierig sein, Ihre Ressourcen zu finden und von ihnen zu lernen. Deshalb ist der Aufbau von Kapazitäten ein zentraler Punkt des Content Partnerships Hub, an dessen Einrichtung Wikimedia Sverige arbeitet. Damit Affiliates und Freiwillige auf der ganzen Welt stärkere Inhaltspartnerschaften aufbauen können, müssen sie in der Lage sein, voneinander zu lernen; eine kleine, neu gegründete Ortsgruppe sollte nicht gezwungen sein, das Rad neu zu erfinden, während es eine Fülle von Ressourcen gibt, die von etablierteren Ortsgruppen geschaffen wurden und auf die man zurückgreifen kann. Unsere Vision für die Drehscheibe für Inhaltspartnerschaften ist, dass sie den Wissensfluss zwischen den angeschlossenen Organisationen und Freiwilligen erleichtert und es für alle einfach macht, diese Ressourcen zu teilen und von ihnen zu profitieren. Und um dies zu erreichen, brauchen wir eine technische Infrastruktur, die sowohl flexibel als auch einfach zu nutzen ist.

Ein Teil des Beitrags von Wikimedia Sverige zum Kapazitätsaufbau auf globaler Ebene ist der Content Partnerships Hub Helpdesk, eine Infrastruktur, die Wikimedianer*innen praktische Unterstützung bei der Planung und Durchführung von Inhaltspartnerschaften bietet, insbesondere für lokale Gemeinschaften, die derzeit in der Bewegung unterversorgt und unterrepräsentiert sind. Wikimedianer*innen können Fragen und Anfragen an den Helpdesk richten, und die Mitarbeitenden des Hubs werden entweder die benötigte Hilfe leisten oder die antragstellende Person mit jemand Hilfeleistenden in Verbindung bringen. Die Arbeit des Helpdesks wird von einem Expertisekomitee geleitet, das aus erfahrenen Mitgliedern der Bewegung mit unterschiedlichem Hintergrund besteht. Wir sehen die Metabase als eine natürliche Erweiterung des Helpdesks, indem wir Material hinzufügen, das bei der Beantwortung einer spezifischen Anfrage identifiziert wurde, indem wir auswerten und analysieren, welches Material zum Kapazitätsaufbau vorhanden ist und was fehlt oder aktualisiert werden muss usw. Wir möchten, dass die Metabase ein Ort wird, in dem alle selbst nach der globalen Bibliothek der Wikimedia-Ressourcen suchen - und zu ihr beitragen - können.

Da die Daten in einer verlinkten, strukturierten Wissensdatenbank gespeichert sind, kannst du sie nach deinen Bedürfnissen durchsuchen. Du kannst zum Beispiel nach Links zu Youtube-Tutorials über Wikimedia Commons auf Schwedisch oder nach Foliendateien von Präsentationen über Bibliothekspartnerschaften suchen. Eingehende Helpdesk-Anfragen könnten ebenfalls in der Metabase gespeichert werden, so dass es für die Community im Allgemeinen und den Expertise-Ausschuss einfacher wird, sich einen Überblick darüber zu verschaffen, was getan wurde und welche Materialien vom Helpdesk speziell zur Erfüllung der Anfragen erstellt wurden.

Scope

The Movement material in scope of Metabase encompasses:

Conferences and other events (seminars, edit-a-thons, campaigns etc.) about Wikimedia related topics;
Presentations and other contributions, such as panel participation, by Wikimedians and/or on Wikimedia topics, in events not organized by the Wikimedia movement;
Material produced as a result or in connection with with the above, such as slide decks, posters, reports, video recordings;
Publications, such as articles, blog posts, tutorials – in both text and video form – on Wikimedia topics.

Limitations

It should be kept in mind that our goal with the initial development of the Metabase content has not been to fully cover any particular subject area. Our staff resources and time are limited, so we had to make a decision about the direction of our work that could bring the most benefit to the project. One alternative could have been to attempt to focus on one particular area, research it in depth and provide a full coverage. It was definitely an attractive alternative – who doesn't like exploring one specific topic? – but it would mean we wouldn't be able to present a nuanced view of what Metabase could be. Our goal with the project is to explore the possibilities of the platform and experiment with different topics before we invite everyone in the movement to build on the foundations and expand them.

For this reason, we prioritized covering a broader set of examples more shallowly than to cover a few in-depth. We hope that this allows us to showcase the opportunity space better. Besides, we would not have been able to research the details of the work done by Wikimedians globally, as we are just a couple individuals with extremely limited language skills, which prevent us from familiarizing ourselves with the vast amounts of work done beyond the Anglosphere. Due to this, there will be gaps even within the areas we chose to specifically focus on. Our ambition has been to provide a starting point for the Movement with plenty of examples to make it as easy as possible for anyone to pick up where we left off and continue developing the content.

We also hope that our work will be discussed and criticized, so that the final shape of Metabase is a collaborative effort by the global community. We have built a foundation but in the end, a comprehensive knowledge base will require continuous work from the community.

Method

We tested two approaches to filling Metabase with data.

The first approach was event-focused. We selected two GLAM Wiki conferences (GLAM Wiki 2023 in Uruguay and GLAM Wiki 2018 in Israel) and input the information about the events and activities that took place during them. The reason for working on these particular events is that they combine an international scope with a focus on collaboration with cultural heritage institutions, which is in line with our vision for Metabase as a resource for content partnerships. We assumed we could find many relevant presentations and documents there.

The second approach was topic-focused. We chose OpenRefine as a focus topic. This open-source software is being used widely by the Wikimedia community for uploads and editing on both Wikidata and Wikimedia Commons. The Content Partnerships Hub Helpdesk, Wikimedia Sverige's support infrastructure for the Movement, regularly receives requests from volunteers and affiliates that can be fulfilled using OpenRefine, so we are aware there is a great need among Wikimedians to learn to use the software in different contexts.

Apart from Wikimedians, OpenRefine is used by data journalists, scientists and information professionals – and more. Since the software has a broad range of applications in several communities, a lot of information resources have been created, and we believe it is worth the effort to collect them in one place, to facilitate knowledge exchange between communities. The resources take a broad range of forms, from help pages on the Wikimedia platforms to blog posts, Youtube videos, presentation slides and scholarly articles.

Another aspect of OpenRefine that makes it an interesting topic for this case study is that the Wikimedia Commons features are relatively new, leading to a lot of interest from Wikimedians wishing to start using it for file upload and SDC work. In order to do that, they need to be able to locate and access the available resources, which is exactly what Metabase sets out to facilitate.

Event data input

The workflow of inputting data from a multi-part event, like a conference, is as follows:

Locate the conference program.
Create an item for the conference.
Create items for each session.
Create any items necessary to describe the details of the session, such as the person(s) and organization(s) involved, the language of the session, or the main subject.
Link the session items to the conference item using part of / has part(s).
Create an item for the slide deck used, if any, and link it to the session item using uses / used by.

See, for example, the session A missing piece of the puzzle: Providing direct support for content partnerships through the Helpdesk at the Content Partnerships Hub at the GLAM Wiki 2023 conference.

The session format

What is a conference session? Intuitively, we first assumed that every session described in the conference program would be a discrete presentation. However, this turned out not to be the case. Several independent presentations by different speakers can be grouped in a thematic session under a common title. This session will have one entry in the conference program. This is a typical format for lightning talks, but is not limited to them. Due to this, we decided on a session model:

A conference consists of several sessions.
These are linked using has part(s) / part of.
A session can be either a hybrid event, an in-person event or an online event. This means that every session has two instance of statements.
A session consists of one or several specific activities. For example, a session can contain three separate presentations, by different speakers, on the same topic. What brings them together is that they are grouped as one session in the program. This is modeled using has part(s) of the class and the qualifier quantity.
- Example: Wikidata for cultural heritage, which contains three presentations. Compare the description in the conference program.
A session can have one or several speakers (people who present) or leaders (people who facilitate a practical activity, like a workshop).

Topical data input

In order to input data about resources related to a particular topic, the resources have to be identified. The following sources were used:

Wikimedia Commons categories OpenRefine slide presentations and OpenRefine video presentations.
OpenRefine/Presentations on Meta.
Programs of the major Wikimedia conferences, such as Wikimania and Wikidatacon.
Google searches.

Due to our own limitations and bias, the majority of the resources identified were in English.

Results

Events

The conferences GLAM Wiki 2023 and 2018 were input into Metabase. The two conferences cover 146 sessions. 100 unique index terms (keywords) were used to describe the topics of the sessions.

Showcase SPARQL queries

Topics

As of June 2024, there are 74 items with a main subject = OpenRefine statement. A large number of those are events, such as conference sessions. 46 of them are different types of published documents, including mostly slide decks from different presentations, but also a number of video recordings, tutorials and blog posts. The majority of them are in English, with a small number of resources in Swedish and other Western European languages, which reflects our limitations when locating the resources – and makes it clear how important it is for more participants from different backgrounds to contribute – as our assumption is that other resources indeed do exist in other languages. An additional benefit of using Metabase to survey the available resources is that it will make it easier for everyone to notice patterns – which languages are over-represented, and which ones under-represented, in relation to the number of Wikimedians who might need them? – and provide a groundwork for prioritizing the creation and translation of resources in the most needed languages.

We can use the fact that many of the resources have multiple main subject statements to examine what topics are most frequently co-occurring with OpenRefine. Those include, not surprisingly, Wikidata, Wikimedia Commons and upload. We imagine that as our data grows, we will be able to gain interesting insights about different co-occurring topics.

Since published documents have a publication date, it's also possible to plot them over time. This enables us to quickly see which resources are the oldest, and thus potentially outdated – we might not want to refer to them when advising someone about the most relevant learning material. Being able to see the most recent resources is useful for those who want to catch up on the newest functionalities in OpenRefine, such as Wikimedia Commons integration.

Challenges and considerations

In general, expanding our scope from Wikimedia Sverige's own data into resources from the global Wikimedia movement was a challenging but also interesting experience. It gave us an opportunity to reflect on the current practices of knowledge management within the Wikimedia movement; a necessity if we want to improve it.

The following issues became apparent during the work:

Data quality

Data degradation has proved to be an issue, especially when researching conferences and other events. This was relevant when compiling the material on OpenRefine: in order to describe a slide deck from a conference presentation, we need to input at least basic information about the event.

The further back in time we go, the higher the probability that the original conference program has been moved from its original website, or even deleted altogether; it is not always given that a snapshot taken at an appropriate time is available in the Internet Archive. While events organized by the Wikimedia movement, and documented on one of the wikis, are relatively easy to research, those arranged by other actors can require more digging – especially if the program was published in a non-standard format, or only available to logged-in participants.

Having said that, it does not mean that events organized by the Movement are always easy to model. Different conferences present their programs in different formats; different yearly editions of one conference are not necessarily consistent. Crucial information, such as the language of the session (in multilingual events) or the affiliation of the speakers might not be immediately visible. Also, post-conference documentation, such as slide decks and video recordings, is not always easy to find and link to the specific sessions.

Scope

When working with OpenRefine material, the question of scope became apparent.

The fact that multiple discrete communities are using and sharing information about the tool makes it an interesting case, and was indeed one of the reasons why it was selected as a focus topic. Some Wikimedians might not be aware of the resources provided by the library community, and vice versa, but manuals on e.g. data editing with GREL and Jython can be useful to all users, regardless of the scope of their work.

At the same time, it should be noted that this variety of available resources, produced by both Wikimedians and non-Wikimedians, did force us to reflect on the exact scope of Metabase. Yes, some educational material created with non-Wikimedians in mind is of great value to the community; an article on GREL can be valuable to any of us even if it doesn't mention Wikidata at all. But where to draw the line? There's also the question of how we should approach items such as books and scholarly articles, which are in scope of Wikidata. It might be enough to store very basic information on them in Metabase, to indicate that they exist and are about a relevant topic, but the actual detailed bibliographic information should be offloaded to Wikidata. As we mention in Setting up a Wikibase from scratch, for items that are in scope of both Metabase and Wikidata, we want to have a clear understanding of which of the projects should be "responsible".

Breadth vs. depth

When conducting the work, we had to strike a balance between describing things in such a detailed way that it's possible to provide all the information that we imagine is relevant and using our limited staff resources in an efficient way. While it's natural to try to research every single item in depth, the goal was, as mentioned previously, not to fully cover the available material, but rather to provide a broad range of examples of the sort of information Metabase could hold.

Topics

Looking at the purpose of Metabase – facilitation of locating relevant Movement resources – topics are in the center. Without accurate topic tagging, it will be impossible for users to sort through the material and identify what they are looking for. At the same time, we often ran into problems when trying to add keywords to presentations and the like, especially those from outside our areas of expertise. While many conference programs have topics assigned to the sessions, they are often broad, such as collaboration or GLAM. In order to identify more informative keywords, like Creative Commons licenses or art museums, you have to read the session description, which we obviously only could do if the session in question was within our area of expertise and we could understand the description.

Lack of external contributions

It should be noted that many of the challenges listed here stem from the fact that we were working on a proof of concept. As mentioned previously, we do not expect Wikimedia Sverige to actually fill Metabase with data all by ourselves. We have prepared a platform, a foundation on which the community can build upon. Hopefully, with some more users from outside of our organization, these problems will be resolved organically.

At the same time, we are aware that developing the initial structure of Metabase completely on our own creates certain limits. We have had a small number of people working on it, on and off, for a short period of time, which carries a risk that the modeling solutions we have developed will not prove usable to other members of the Movement. It is not only until they have tested contributing to and using Metabase, as well as given feedback on its structure and vision, that we can know it actually is a useful project.

Conclusions and ideas for future work

The idea of collecting the resources created by the community is not new. For example, the Wikimedia Foundation made an attempt to aggregate the information about events and initiatives contained in the This Month in GLAM Newsletter. We are aware of and impressed by this information collection initiative, and we have started investigating ways to collaborate and make use of the data, so that it becomes accessible to more people through Metabase.

There's also been an initiative to collect the information about the different tools used in different stages of GLAM content partnerships projects, conducted as part of the preparatory work for establishing the Content Partnerships Hub. There have been attempts to catalog the many tools the Movement uses as well, such as the Toolhub and Hay's Tools Directory. Affiliates around the world have their tools and knowledge repositories, such as newsletters, wikis and blogs. All the conferences and other events we organize, from Wikimania to local meet-ups, are described in different places.

In other words, there's plenty of area to both make the work of knowledge collectors easier and set up knowledge seekers for success. We imagine that Metabase can become a hub where all these sorts of information can be collected on a shared platform, accessible to all. Information collectors in our Movement do amazing work in less than perfect conditions, with the plethora of platforms and tools available. And collecting the information is only the beginning – it is not actually useful unless everyone interested is able to quickly and easily find what they are looking for.

However, for the platform to succeed, it is necessary that as many people as possible contribute to it. Our work with data input to date has been experimental and exploratory – we wanted to test several approaches to the data collection, from both an event and a thematic point of view. Wikimedia Sverige – or any one affiliate – does not have the skills or resources necessary to track all the activities of affiliates and volunteers around the world. We hope that we have shown that Metabase is a platform worthy of investing time and effort into, and we are looking forward to assisting those who would like to do that.