WikiCite 2016/Report/Group 4/Notes
Notes and links
edit- see also StrepHit 1.0 Beta Release
- see also IEG Grant – StrepHit: Wikidata Statements Validation via References
Goal
edit- Play with the current StrepHit dataset: biographies in English; DONE
- create and fill a Request for Comments; DONE
- encourage referenced data donations through the primary sources tool: DONE
- Follow up on past discussion with ContentMine and Hypothes.is people: DONE
Notes
edit- primary sources tool: editing the statement if something's wrong
- Till with genomic datasets
- transcripts for d:Q414043
- 10k genes
- domain-specific use cases == domain-specific curation tools
- different colors
- use hypothes.is API to highlight extracted sentences
- ContentMine
it would be great if you could add a statement of interest about ContentMine's potential data donation via the primary sources tool here (feel free to add a new section of course): https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline
Instructions to upload a dataset to the primary sources tool:
- format your data in the QuickStatements syntax, documentation at http://tools.wmflabs.org/wikidata-todo/quick_statements.php
- ping me for an API access token
- upload the dataset through the following API endpoint
- https://tools.wmflabs.org/wikidata-primary-sources/import
- Documentation at https://github.com/google/primarysources/tree/master/backend#import-statements
Alternatively to points 2 and 3, you can just give the dataset to Hjfocs and he will upload it directly.
Data modeling, i.e., from ContentMine extraction results to the QuickStatements dataset.
- Each statement is composed of:
- A. subject = given the extracted named entity, look up the subject Wikidata Item ID via
- A.1. SPARQL
- A.2. API endpoint: https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities
- A. subject = given the extracted named entity, look up the subject Wikidata Item ID via
- B. property = d:property:P248 'stated in'
- C. value = item ID of the source, e.g., d:Q229883 for PubMed Central
- D. reference URL = d:P854
Side notes
- FrameNet lexical database for N-ary relation extraction:
https://framenet.icsi.berkeley.edu/fndrupal/
- Instead of 'stated in', a better property would be 'mentioned in', but it has been rejected: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/45#mentioned_in
- Adam: references collected from Microdata
- especially for movies
- google custom search for specific microformats (cf. Sindice)