WikiCite 2016/Report/Group 8
Group 8: Citoid-Wikidata integration
edit- Room 121 (subgroup)
- Etherpad: Group 8 (parent Etherpad)
Attendees
editAlphabetical by first letter
- Alex Kalderimis (RefMe)
- Katie Filbert (Wikimedia Deutschland, Wikidata)
- Marielle Volz (Wikimedia Foundation) (attending remotely)
- Philipp Zumstein (Universitätsbibliothek Mannheim (Mannheim University Library))
- Sebastian Karcher (Qualitative Data Repository / Zotero, Citation Style Language (CSL))
Links
edit- Proposal: https://meta.wikimedia.org/wiki/WikiCite_2016/Proposals/Citoid_integration_for_Wikidata
- Example Call for the Citoid API: https://citoid.wikimedia.org/api?format=zotero&search=http%3A%2F%2Flink.springer.com%2Fchapter%2F10.1007%2F11926078_68
- https://citoid.wikimedia.org/
- Citoid Codebase on github: https://github.com/wikimedia/citoid/
- Citoid calling CrossRef: https://github.com/wikimedia/citoid/blob/master/lib/Scraper.js#L293
- Example translator: https://github.com/wikimedia/citoid/blob/master/lib/translators/openGraph.js
- DC translator test: https://github.com/wikimedia/citoid/blob/master/test/features/unit/translators/dublinCore.js#L8
- Full scraper tests: https://github.com/wikimedia/citoid/blob/master/test/features/unit/scraper.js
Tasks
editSubtask 1: Extend the Configuration
editCurrent version: https://github.com/filbertkm/wikidata-refs/blob/master/template.json
List of Zotero item types and fields:
List of available properties
- https://tools.wmflabs.org/sqid/#/
- https://tools.wmflabs.org/hay/propbrowse/
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData#Properties
Templates:
Zotero Field to Wikidata property mapping for itemType journalArticle:
- See list of fields here: http://aurimasv.github.io/z2csl/typeMap.xml#map-journalArticle
"title": { ,// this should really go in the "label" field- not be a property.
"id": "P78",
"valuetype": "monolingualtext"
},
"url": {
"id": "P973",
"valuetype": "string"
},
"date":{
"id": "P577",
"valuetype": "time"
},
"DOI": {
"id": "P356,
"valuetype": "external identifier"
},
"volume": {
"id": "P478",
"valuetype": "string"
},
"issue": {
"id": "P433",
"valuetype": "string"
},
"URL": {
"id": "P854",
"valuetype": "URL"
},
"PMID": {
"id": "P698",
"valuetype": "external identifier"
},
"PMCID": {
"id": "P932",
"valuetype": "external identifier"
},
"seriesTitle": { // see series
"id": "P1433->P1476",
"valuetype": "string"
},
// The below types are items and it may not be possible to full represent what is needed for them in JSON
"publicationTitle": {
"id": "P1433",
"valuetype": "item" //match via ISSN?
},
"author": {
"id": "P50" //expects items, combine firstName and lastName for label. Also we'll be getting multiple items here to create.
"valuetype": "item"
},
"editor": {
"id": "p98",
"valuetype": "item"
},
"rights": {
"id": "P275",
"valuetype": "item" //match a string
},
"language": {
"id": "P407", //watch deletion/merge discussion at https://www.wikidata.org/wiki/Property:P407
"valuetype": "Item"
},
"series": {
"id": "P1433",
valuetype": "item" // Journal
},
Problems:
- Handling agent names (personal and institutional names):
- Authors are represented as Items. An item needs to be created for each author, with the appropriate properties set. Note that this may involve duplication of entities, where matching items cannot be resolved.
- published in also requires an itemJournal is also an item.
- Particularily since it makes to sense to search for these items
Possible model for handling items where the valuetype is an item:
"publicationTitle": {
"id": "P1433",
"valuetype": "item" //match via ISSN?
"item" : { //fields corresponding to the item: //value of publication title is implied as the label
ISSN: { //issn field belongs to journal, not journalArticle
"id":"",
valuetype:""
}
}
},
ISSN should be used to link with the journals and can then afterwards be queried with SPARQL (Wikidata has a hierarchical types, graph model). The same should be true of `series`, `seriesTitle`, `shortTitle`, `libraryCatalog`, `issue`, which are properties on other entities.
Subtask#2 Improve ID import into Citoid
editAdditional IDs for Wikidata
- e.g. JSTOR, OCLC, arXivID, Imdb, MR (mathematical review
- we're usually aware of these (e.g. when using citoid on JSTOR or arXiv), but aren't importing them -- Zotero puts them in the extra field; Citoid already parses that for PMID. Add some more to Zotero extra, add more parsing to Citoid.
- idea:
- pack some of these ids in the `extra` field in the zotero translator,
- make sure Citoid understand that
Support non-CrossRef DOIs in Citoid
- Options
- add DataCite translator
- use DOI.org API
- test via http://doi-cache.dissem.in/
- make post request with accept header set to application/citeproc+json
- Possible issues: different formats from different agencies
- would still need to rewrite translator, which currently relies on COinS (!) in API response
Integrate general translators (Metadata, Highwire etc.) instead of dublicating these functions
- Citoid Tools: https://github.com/wikimedia/citoid/tree/master/lib/translators
- It seems that the corresponding zotero translators are not working
- reached out to Zotero on this; exploring further