Grants:Project/DBpedia/GlobalFactSyncRE/Timeline


Timeline for DBpedia

edit
Timeline Date
Study (choose two initial sync targets and analyse the lack of references in Wikidata) Day Month Year
GlobalFactSync tool (extend the current prototype with new features) Day Month Year
Mapping Refinements Day Month Year
GlobalFactSync WikiData ingest Day Month Year
GlobalFactSync Sprints Day Month Year


Monthly updates

edit

Please prepare a brief project update each month, in a format of your choice, to share progress and learnings with the community along the way. Submit the link below as you complete each update.

Current tasks

edit

A log of current tasks is kept here. Ongoing discussions should be held using the corresponding discussion page.

(Preparation) April/May

edit

June 2019 (official start)

edit

July 2019

edit

First Release Report: A first release containing detailed information about our micro-services is published on the DBpedia Blog

Containing:

  • First success story
  • Deployment of first micro-services on the server
  1. Initial User Interface here
  2. PreFusion JSON API here (user: read, pw: gfs)
  3. Reference Extraction Service here
  4. Reference Data Download here
  5. Infobox Extraction Service here
  6. ID service here
  1. definition of a set of problems with different layers of complexity
  2. analysis of various groups of subjects with respect to these synchronization problems

August 2019

edit
  • Continuing improvements of the first deployments, which will be an ongoing process. Especially the GFS Data Browser is being worked on:
    • users can now insert any Wikipedia URL into the subject search field
    • overall layout improvements
    • reference information is being added
  • Johannes Frey presented the GFS project at Wikimania
  • We created a news page within our Meta-Wiki project page framework for volunteers to keep them in the loop and encourage exchange. So far this has lead to three more volunteers signing up for our 'GFS Feedback Squad' and two users leaving feedback about our sync target study.

September 2019

edit
  • more work towards sync target study, focus on targets that were brought up by Wikidata users (e.g., geo coordinates, employer, nobel price)
  • intensive work on creating the complement to Wikidata and Wikipedia by collecting and providing data that is currently missing in both

October 2019

edit

November 2019

edit
  • re-extraction of GFS data and fusion
  • some work on the UI
  • identifying and testing ways to generate lists of the Wikipedia articles related to selected topics: categories, infoboxes, Wikidata queries and other articles (lists).

December 2019

edit
  • extraction of reference data for Polish cities; studied sources: BDL - Bank Danych Lokalnych, Wikipedia, Wikidata
  • analysis of available mappings between various geographical identifiers for Polish administrative units
  •  
    showing current understanding of the fusion challenge

January 2020

edit

February 2020

edit

March 2020

edit
  • experiment prototype for improved harvesttemplate
    • index Infoboxes / Templates

April 2020

edit
  • experiment prototype for improved harvesttemplate
    • index Infoboxes / Templates

May 2020

edit
  • watch for feedback of new mockup

June 2020

edit
  • incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump

Planned Next Steps for July, August and September 2020

edit
  • incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump
  • GFS browser features
    • include mapping management to allow search for properties of new external sources


Is your final report due but you need more time?



Extension request

edit

September 30, 2020

edit

In the last month output of our project was quite invisible as we 1. worked a lot on the data 2. had to deal with corona and all its consequences like missing child care. On the good side, we have quite a lot of budget (9000€) left and would like to stretch the project for four months like a budget-neutral extension. We still need time until end of September 2020. Project-wise we found this dump: enwiki-20200401-wbc_entity_usage.sql.gz

- Tracks which pages use which Wikidata items or properties and what aspect (e.g. item label) is used. So we see it realistic to provide the following:

- We have one of the best infobox parsers and we have full information about all properties there. This means we can produce a reliable Wikidata adoption report, which show how much Wikidata is adopted, where it is well adoption in Wikipedia and where it can be improved.

- We can use this to calculate "good imports" from Wikipedia to Wikidata, i.e. where data in WP infoboxes is especially plentiful and well referenced, but missing in Wikidata

- With the improvements on https://tools.wmflabs.org/pltools/harvesttemplates/ we would have a powerful User Interface to exactly tackle these spots

In addition, we started to index authoritative datasets that are often referenced in WP and WD. Taking this data from the source, we can build an interface, e.g. a user script to suggest relevant data points from these data sets to users for inclusion. This part might be experimental, but it would work like this: On https://pl.wikipedia.org/wiki/Pozna%C5%84 Populacja (30.06.2019) • liczba ludności 535 802[3]

[3] is the population count from stat.gov.pl holding the official census for Poland. If this gets updated, we might be able to autodetect that a change is required either in the infobox or on Wikidata (that is up to the community policy).

This will not be complete, but it will probably work for 10-50 million entries in Wikipedia and Wikidata, depending on the quality of the source and how official it is. In the next few month we need to work on the following topics:

- incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump

- GFS browser features

- include mapping management to allow search for properties of new external sources

@Juliaholze: Hi Julia, thanks for this request and context over your remaining budget as well as the disruptions you experienced due to the pandemic. We can appreciate that work on the project needed to be paused in order to focus on other, more important priorities, as we have experienced these same needs at the Wikimedia Foundation as well. This extension until 30 September 2020 to complete the above activities is formally approved. Your final report will be due on 30 October 2020. I JethroBT (WMF) (talk) 21:25, 6 July 2020 (UTC)
@JethroBT (WMF): Hi Chris, many thanks for your reply. We will complete the above activities and tasks.

Extension request

edit

New end date

edit

November 30, 2020

Rationale

edit

We would like to request another budget-neutral extension. The main reason is very similar to the previous one. We are currently in the process of adding many authoritative datasets to the GFS browser, which will then enable to have "official" data from the appropriate sources to be included into Wikipedia/Wikidata. In the next two months we need to work on the following topics:

  • GFS browser features
  • include mapping management to allow search for properties of new external sources

Please also see our email to the WMF Grants Administrator.

Approval

edit

This request is approved. Your new Project end date is November 30, 2020, and your Final Report is due on December 30, 2020.

Marti (WMF) (talk) 19:08, 15 October 2020 (UTC)

Extension request

edit

New end date

edit

January 31, 2021

Rationale

edit

Since the beginning of December 2020 we deal again with corona and all its consequences like a national lockdown and missing child care. I am sorry to inform you that we need more time to finish our final report for the GlobalFactSyncRE project. We already started to write the report and we requested bank statements to document all expenses. We need more time to summarize all project results and document the outcome. We hope that you and your families are safe and well, despite the disruptions and consequences of covid. Kind regards, Julia

Approval

edit

This request is approved. Your new project end date is January 31, 2021.

--Marti (WMF) (talk) 22:23, 15 January 2021 (UTC)

Noting here that your new final report due date is 2 March 2021. Thank you. -- JTud (WMF), Grants Administrator (talk) 23:13, 15 January 2021 (UTC)