WikiCite 2016/Report/Group 8

Group 8: Citoid-Wikidata integration

edit
Room 121 (subgroup)
Etherpad: Group 8 (parent Etherpad)

Attendees

edit

Alphabetical by first letter

  1. Alex Kalderimis (RefMe)
  2. Katie Filbert (Wikimedia Deutschland, Wikidata)
  3. Marielle Volz (Wikimedia Foundation) (attending remotely)
  4. Philipp Zumstein (Universitätsbibliothek Mannheim (Mannheim University Library))
  5. Sebastian Karcher (Qualitative Data Repository / Zotero, Citation Style Language (CSL))
edit

Tasks

edit

Subtask 1: Extend the Configuration

edit

Current version: https://github.com/filbertkm/wikidata-refs/blob/master/template.json

List of Zotero item types and fields:

List of available properties

Templates:

Zotero Field to Wikidata property mapping for itemType journalArticle:

See list of fields here: http://aurimasv.github.io/z2csl/typeMap.xml#map-journalArticle
"title": { ,// this should really go in the "label" field- not be a property. 
  "id": "P78",
  "valuetype": "monolingualtext" 
}, 
"url": {
  "id": "P973",
  "valuetype": "string"
},
"date":{
    "id": "P577",
    "valuetype": "time"
},
"DOI": {
    "id": "P356,
    "valuetype": "external identifier"
},
"volume": {
    "id": "P478",
    "valuetype": "string"
},
"issue": {
    "id": "P433",
    "valuetype": "string"
},
"URL": {
    "id": "P854",
    "valuetype": "URL"
},
"PMID": {
"id": "P698",
"valuetype": "external identifier"
},
"PMCID": {
"id": "P932",
"valuetype": "external identifier"
},
"seriesTitle": { // see series
    "id": "P1433->P1476",
    "valuetype": "string"
},
// The below types are items and it may not be possible to full represent what is needed for them in JSON
"publicationTitle": {
    "id": "P1433",
    "valuetype": "item"  //match via ISSN?
},
"author": {
      "id": "P50"  //expects items, combine firstName and lastName for label. Also we'll be getting multiple items here to create. 
      "valuetype": "item"
 },
"editor": { 
    "id": "p98",
    "valuetype": "item" 
    },
"rights": {
"id": "P275",
"valuetype": "item" //match a string     
},
"language": {
"id": "P407",  //watch deletion/merge discussion at https://www.wikidata.org/wiki/Property:P407
"valuetype": "Item"
},
"series": {
    "id": "P1433",
    valuetype": "item" // Journal
},

Problems:

  • Handling agent names (personal and institutional names):
    • Authors are represented as Items. An item needs to be created for each author, with the appropriate properties set. Note that this may involve duplication of entities, where matching items cannot be resolved.
    • published in also requires an itemJournal is also an item.
  • Particularily since it makes to sense to search for these items

Possible model for handling items where the valuetype is an item:

    
"publicationTitle": {
    "id": "P1433",
    "valuetype": "item"  //match via ISSN?
    "item" : { //fields corresponding to the item: //value of publication title is implied as the label
        ISSN: { //issn field belongs to journal, not journalArticle
            "id":"",
            valuetype:""
            }
        }
},

ISSN should be used to link with the journals and can then afterwards be queried with SPARQL (Wikidata has a hierarchical types, graph model). The same should be true of `series`, `seriesTitle`, `shortTitle`, `libraryCatalog`, `issue`, which are properties on other entities.

Subtask#2 Improve ID import into Citoid

edit

Additional IDs for Wikidata

  • e.g. JSTOR, OCLC, arXivID, Imdb, MR (mathematical review
  • we're usually aware of these (e.g. when using citoid on JSTOR or arXiv), but aren't importing them -- Zotero puts them in the extra field; Citoid already parses that for PMID. Add some more to Zotero extra, add more parsing to Citoid.
  • idea:
    • pack some of these ids in the `extra` field in the zotero translator,
https://github.com/zotero/translators/commit/046a7a584ca901744e74f586e3123b5eb9d7facc
https://github.com/zotero/translators/pull/1065
    • make sure Citoid understand that

Support non-CrossRef DOIs in Citoid

Options
  • add DataCite translator
  • use DOI.org API
  • Possible issues: different formats from different agencies
  • would still need to rewrite translator, which currently relies on COinS (!) in API response

Integrate general translators (Metadata, Highwire etc.) instead of dublicating these functions