Structured data for GLAM-Wiki/Roundtripping/KMB
Metadataroundtripping Wikidata - Wikicommons - Runestones pictures
editby Magnus Sälgö user: salgo60 twitter salgo60 ORCID 0000-0003-2568-267X GITHUB: salgo60/Litteraturbanken_wd_runes start: 3 Mar 2021 at LD4 2021 https://ld42021.sched.com/ - 2021 LD4 Conference on Linked Data in Libraries this paper was presented see video https://www.youtube.com/watch?v=GeDXzInR_mA
connecting Wikidata (Q2013) / Wikimedia Commons (Q565) / RAÄ Swedish National Heritage Board (Q631844) / Uppsala University (Q185246) / Swedish Literature Bank (Q10567910) using Wikidata / Wikicommons and Structured Data in Commons that use Wikibase and implements semantic interoperability with Wikidata easy accessible with SPARQL federation.
Background
editA new project has been created Everlasting Runes (Q105378723) by Swedish National Heritage Board (Q631844) and Uppsala University (Q185246) see https://app.raa.se/open/runor / urn:nbn:se:uu:diva-354747 “Everlasting Runes”: A Research Platform and Linked Data Service for Runic Research - link grant.
- every runeinscription in this application is identified in a database K-samsök that has a property Swedish Open Cultural Heritage URI (P1260) in Wikidata
- about 3300 of those runinscriptions are available in Wikidata and has Swedish Open Cultural Heritage URI (P1260) see SPARQL below
- pictures of runeinscriptions can be found at Swedish National Heritage Board (Q631844) in an application KMB. Many of those pictures has been uploaded to Wikicommons
- the Swedish old literature is today scanned by Swedish Literature Bank (Q10567910) the ambition is to scan in the next phase all 19th century books and they have scanned e.g. some classic literature about runic research as en:Sveriges_runinskrifter - see books from Royal Swedish Academy of Letters, History and Antiquities (Q1792159) - link books - link grant.
Crowdsourcing starting with low hanging fruits were we easy can trust each other
editOne problem we see when doing crowdsourcing is that the domain experts not always trust what a crowdsourcing community deliver. As runeinscriptions are a never ending interest of Swedish researchers and I guess the "rocket scientist of the 17th century" started this work and they early identified the "identification problem" and early understood the need of persistent unique identifiers as a need for not getting metadatadebt - something the GLAM sector still in 2021 not always deliver ;-) (see sad example from Europeana and artist "Carl Larsson"). We have now scanned books from the 1700 available online about the runinscriptions with those persistent unique identifiers and also 100 years old books with another numbering system that is Wikidata Scandinavian Runic-text Database ID (P1261) --> its rather easy to identify in a book for a crowd sourcing community what runeinscriptions they speak about as the experts already have added the unique persistent number to the object --> less trust issues between crowdsourcing people and domain experts plus in this case the photo in itself can confirm the metadata what the picture depict.
Runeinscriptions and structured data on Commons is a good example how Wikipedia can add value with crowdsourcing. Taking pictures of runestones can also be a rather easy task and need no major domain skills as the runeinscriptions often have a sign next to it 😉 with the unique persistant number (see example sign DR 279). And in old scanned books we often see runinscriptions are mentioned by the unique numbering schema that is easy to "translate" to the Wikidata Q-number using a property like Scandinavian Runic-text Database ID (P1261).
- the domain experts as Swedish National Heritage Board (Q631844) and Uppsala University (Q185246) can now with this presented workflow easy retrieve new pictures with a SPARQL federation from Wikidata and Wikicommons (same as JSON) and select what pictures are new and add value
- by having the ID of the original picture = Swedish Open Cultural Heritage URI (P1260) in SDC - Structured Data on Commons the domain experts can also see what pictures are new and if it is pictures that they already have in the KMB collection. They can compare the metadata and maybe Wikicommons has added something of interest as depicts (P180) or any other fields...
- Uppsala University (Q185246) database has been matched in Alvin ID (P6821) see phab:T225522 and my hope is that we in the future can do the same with this database for "typed species" see Phab:T236310 as this database has many of Carl Linnaeus (Q1043) first collected samples.
- the electronic library of Swedish books Swedish Literature Bank (Q10567910) has created a literature map and my hope they start use the work we have done to add e.g. runestinscriptions to the literature map with referencies to books they created see
- Wikidata generated map of just runestones with a reference to a page at Swedish Literature Bank (Q10567910)
- same map but with marked cluster layout
- the map Swedish Literature Bank (Q10567910) maintain that reference all the books they have scanned (just one part of Sweden has been released)
- Wikidata generated map of just runestones with a reference to a page at Swedish Literature Bank (Q10567910)
What has been done
editSince project SDC (video) - Structured Data on Commons has been implemented and have its own Wikibase it now opens up new possibilities for dataroundtripping with pictures and with a SPARQL query get information from both Wikidata and pictures in Wikimedia commons this access to machine readable data about WD objects and objects in Wikicommons that depicts those WD objects will be a big step forward of more advanced dataroundtripping and also possibilities tracing the original source for a picture in an machine readable easy way and also compare if Wikicommons depicts (P180) statement depicts the same THING as an external source state it depicts.
By uploading pictures to Wikicommons and add depict in SDC this is maybe a way for a community like Wikipedia to start add value to domain specialist by giving them the possibility to decide if the pictures add value or not for them. Long term maybe we can use the knowledge of domain specialist that confirms that a picture they have and we also store in Wikicommons depicts what we state in Wikicommons, maybe we need Signed statements in Wikicommons see EPIC so we can see that the depicts in Wikicommons was what the uploading institution stated.
The result is that we with a SPARQL/query(8725 pictures /same on a map) easy can fetch Wikicommons pictures of Runestones with the RAÄ id for a runestone in the application Everlasting Runes (Q105378723) and also see if this picture originally was uploaded from RAÄ....
What I have done
- Created some more Wikidata objects for Runestones
- Wikicommons: start add depict to pictures of Runestones in SDC - about 8200 pictures
- Wikicommons: start move the information of the KMBid to SDC - about 34 600 pictures (not just Runestones)
- Created SPARQL that finds all Runestones in Wikidata that is connected to RAÄ and finds all the pictures in Wikicommons that has depicts any of those Runestones in SDC
- Pictures / Map - 8200 pictures
- Pictures of Runestones that are from KMB/RAÄ / on a map - 4162 pictures
- Pictures of Runestones with source of file (P7482) = original creation by uploader (Q66458942) on a map - 2727 pictures
Issues
edit- SPARQL using WCQS is just updated once a week and is in Beta see SPARQL in the shadow of Structured Data on Commons
- more difficult than expected to find "all" Wikicommons pictures depicting a Rune - issues/6
- SDC is new and I guess most pictures will not have depict, I guess we need better support for adding depicting in Wikicommons...
- today we have no dialogue with RAÄ/Uppsala how we describe a pictures like this is a picture from behind etc... issues/12 maybe not necessary
- some changes needs to be done how I have added data to SDC see Kanban board and last status report (swedish)
Next step
edit- my wish is that we can copy this to other areas than Runestones. Runestones has been a low hanging fruit as it has good structure, good numbering schemas and also that most Runestones are rather easy to identify when you take a picture (if they have a sign next to them )
- we have tested connect FactGrid - (FactGrid (Q90405608) - FactGrid item ID (P8168)) that is a Wikibase installation set up by Gotha Research Centre Germany (video). They do research about people at the Swedish Order of Freemasons and lesson learned is that you much faster run into cases were you need much more domain skills to do same as see phab:T266745, where I try to match research done at University of Gothenburg (Q371522) by Andreas Önnerfors (Q6257088) with people in Wikidata see academic paper. Runestones with unique identifiers is a proof that Linked data is a big step in the right direction when we shall connect different domains of information.
- one maybe low hanging fruit is the Uppsala University collection of typed species i.e. the first collected and documented sample see Phab:T236310 they will add greate value and I can see that an interesting user case is that you combine this dataset with the data you have in an app like iNaturalist and when you visit the museum you get a notification that this is the first sample of the same art you found 2 years ago using iNaturalist...
- iNaturalist is very well connected to Wikidata with 3 properties
- iNaturalist taxon ID (P3151) 588 348 taxons in iNaturalist are connected to Wikidata e.g. Echinacea purpurea (Q272661) same as iNaturalist 48627
- iNaturalist place ID (P7471) 4 380 places are connected like Garnudden Nature Reserve (Q18290091) same as iNaturalist place 152114 same as Open Street Map 10631430
- iNaturalist observation ID (P5683) less used e.g. 14693435
- iNaturalist is very well connected to Wikidata with 3 properties
- real life experience --> that RAÄ Swedish National Heritage Board (Q631844) and Swedish Literature Bank (Q10567910) start use Wikidata data so we get feedback how to improve the workflow, quality, entity schemas?
- Runestones are rather static especially if we concentrate on Runestones identified in books from 1750 and or 100 years old books. One big next step is what I suggested that we should do with Europeana was looking into "Change management of entities created and deleted in..." see phab:T251225... ie. a museum has a new artist that is uploaded to Europeana and we need to match it to Wikidata and vice versa...
- Better understanding for Linked data and persistent identifiers in GLAM. Many museums have pictures of runeinscriptions BUT nearly no one use Linked data and have specific runinscription objects.... what we see is textstrings and when the data is moved to Europeana it is also moved as Swedish text strings and the people reading Europeana get what I call displayed an rather empty object with metadatadebt
- Example Europeana runestone picture with text strings S_FBM_photo_2M16_S_0096_107_87
- The Europeana object S_FBM_photo_2M16_S_0096_107_87 has very bad metadata with no identification of the Runestone
- Correct would be same as
- Vg 130 --> same as Wikidata property Scandinavian Runic-text Database ID (P1261)
- B 980 --> that is the identifier from 1750 that can be found in this scanned book page 272
- compare Wikidata object runic inscription Vg 130 (Q29576301) in more languages en zh ar bn sv
- Correct would be same as
- The Europeana object S_FBM_photo_2M16_S_0096_107_87 has very bad metadata with no identification of the Runestone
- both "Everlasting Runes – a research platform for Sweden’s runic inscriptions" and "New Paths to the Past. Literary Cultural Heritage as Source Material for the Humanities and Social Sciences" are getting grants from Bank of Sweden Tercentenary Foundation but when delivered its SILOS and no semantic interoperability
- 2021 we need to start define Linked data and Semantic interoperability in the project plans and as a requirement otherwise we get SILOS as described above with Europeana, Swedish Literature Bank (Q10567910) and RAÄ Swedish National Heritage Board (Q631844).
- Example Europeana runestone picture with text strings S_FBM_photo_2M16_S_0096_107_87
Smaller steps
edit- add links and pictures for each runestone in the book Bautil from 1750 ( Kanban board)
- map Bautil connected runestones and if we have in Wikicommons the picture
- images from Bautil
- "better" SDC data
- we have some linkroot with KMB - issues/14
- see if there is a need to describe pictures better like what part is displayed - issues/12
- restructure file source Property:P7482 - issues/11
- see if we can add more sources and add more sources for pictures using file source - issues/8
- Digitaltmuseum - WD Property:P7847
- Gotlands museum - WD Property:P7068
- KMB - WD Property:P1260
- Malmö museum - WD Property:P8773
- National Archives Tora - WD Property:P4820
- Uppsala University Alvin - WD Property:P6821
- ????
Outside the scope
edit- new Wikidata property: a WD property proposal for a dedicated property for "Everlasting runes" has been written as I think we loose functionality when one aggregator (Wikidata) - links to another aggregator K-samsök/Swedish Open Cultural Heritage URI (P1260) - but that is outside the scope of this initiative and I feel its more in the interest of institutions outside Wikidata how and if they will use Wikidata for Runestones and how they would like to link "Everlasting runes".
- en:Wikipedia/sv:Wikipedia: no linking using added Wikidata objects/properties will be done to Swedish Literature Bank or Everlasting runes in articles in en:Wikipedia or sv:Wikipedia
- no "Wikidata driven" templates will be developed in en:Wikipedia or sv:Wikipedia using the added data
RAÄ benefits with Dataroundtripping
editTBA ?!??!
Swedish Literature Bank - link to a page in a book about a Runestone
editSwedish Literature Bank (Q10567910) is a Swedish project scanning all Swedish books from 19th century. They also have started to create a Literature map (litteraturkartan.se) with places related to literature
- In the project salgo60/Litteraturbanken_wd_runes we have connect more than 2000 runestone in Wikidata to a page in a book available at Swedish Literature Bank --> we can easily display different books on a map and maybe can this be reused at Litteraturbanken or at RAÄ as an reference to classic books about Runestones or on the Literature map. See example of a Wikidata Map query with runestones described by a book(as a list) and what possibility to filter on one or more specific books.
- same but map with just english books
- Example how a page in a book is referenced in Wikidata for a Runestone Uppland Runic Inscription 51 (Q18334422) using described by source (P1343)
Swedish Literature Bank benefits with Dataroundtripping
editTBA Swedish Literature Bank ?!??!
Next step Dataroundtripping Swedish Literature Bank <-> Wikidata - position in a book and location on a map
editI would like to see that the map application (litteraturkartan.se) and Wikidata starts to share information.
- we have in Wikidata now the book of Litteraturbanken
- we can add a reference to a page in a book using Swedish Literature Bank edition (P5123)
- we can also say same as a point on the map using Swedish Literature Bank place ID (P9213)
Challenges I see
- is that the map Swedish Literature Bank has is more coordinates than Things i.e. they don't tell if its the house, grave or something else they reference
- sometimes an author is related to a place because they spent a summer there e.g. Karin Boye (Q237413) spent time at Almnäs Castle (Q1614050) --> point=322 and article=327 How do we describe that in Wikidata?
- we miss an active communication with the Swedish Literature Bank. I have spoken with more people like Mats Malm (Q16633253), dev Roxendal, Paulina Helgeson (Q96781783) and also tried to contact Ljubica Miočević (Q105727439) but we miss a good dialogue and I'm unsure if they share my view of the benefits of using Linked data. see my GITHUB monolog issues/29 / issues/25 ... see also the vision/history of the Swedish Literature Bank in Swedish - Salgo60 (talk) 04:37, 27 March 2021 (UTC)
- in the grant from "Bank of Sweden Tercentenary Foundation" link grant it is stated that the project Everlasting Runes should deliver what we have done in Wikidata quote "The research platform will link the published parts of the series Sveriges runinskrifter with the Scandinavian Runic-text Database and make it possible to use both these sources together." As I havnt seen this done I guess the project could use the data in Wikidata and even better if they start using Wikibase and implement it like we have done with Wikicommons and get better semantic interoperability between Wikidata and the project "Everlasting Runes".
Misc
editGITHUB
edit- salgo60/Litteraturbanken_wd_runes
- Swedish Literature Bank - spraakbanken/littb-frontend
- RAÄ / Uppsala Evighetsrunor uppsala-university/Evighetsrunor
Entity Schema
editTo get better quality in Wikidata we have also developed a schema for Runestones EntitySchema:E290, my wish is that we also will use schemas in Wikicommons for Runestone pictures.
Runestones and unique persistant identifiers since 1750
editOne lesson learned is that Runestones research has been done in Sweden since before 1700 and as early as 1750 a book was published Bautil (Q10427451) that had its own numbering schema Bautil 1-1173 see book and map (work in progress). In 1900 a multi-volume catalogue was published en:Sveriges_runinskrifter that used the numbering schema we find in Wikidata property Scandinavian Runic-text Database ID (P1261) in this book we can see that they reference the older numbering schema from the book Bautil and many more ie. feels like "early Linked data" ;-) All those persistent identifiers makes it rather easy for me as a non domain specialist to connect pages in those books to Wikidata objects about the Runestones.
-
Book Swedish Runic Inscriptions from 1958
-
Book Bautil from 1750
-
Wikidata referencing those books
Wikipedia pages and page views on runestones related articles
edit- sv:Wikipedia 1440 articles and 1190 views per day link
- en:Wikipedia 758 articles and 6124 views per day link
- de:Wikipedia 325 articles and 490 views per day link
- number of languages writing about a runestone is about 43, number of runestones with more than 5 articles = 49 runestones
- number of language with an article about one of the most famous runestones Rök Runestone (Q472975) is 25 languages with 215 daily viewers
Lesson learned
edit- persistent identifiers in the domain as Runestones has since before 1750 makes information easier to access and understand also for a non domain expert as me compare me trying to help FactGrid
- if people as early as 1750 could use a numbering schema and unique persistent identifiers why cant we start share pictures 2021 with unique persistent identifiers from all sources and like the book from 1958 reference other sources using those persistent unique identifiers? i.e. when we upload pictures store also the history of this picture so that we can "backtrack its original source" and also find if any of the "previous locations" of this picture has added some useful trusted metadata after the picture was uploaded...
- Recommended reading DOI: 10.5334/dsj-2019-054 "Proper Attribution for Curation and Maintenance of Research Collections: Metadata Recommendations of the RDA/TDWG Working Group"
Notebook examples - SPARQL
edit- draft KMB dataroundtripping.ipynb has some problems with showing pictures in WCQ
- example book "The Wonderful Adventures of Nils" in Wikidata linking Swedish Literature Bank (Q10567910) and the books The Wonderful Adventures of Nils Volume 1 (Q100528488) and The Wonderful Adventures of Nils Volume 2 (Q100621723)
- same but book Bellman was there.... (Q101541514)
- WD "Runestones" connected to books in Swedish Literature Bank (Q10567910)
- related is Literature Signs in Stockholm
More reading
edit- Tim Berners-Lee: The next Web of open, linked data - give us Raw data
- Commons:Structured data - video
- "One way to design a system to be a good external identifier in Wikidata"