This project aims to experiment what a software solution that allow to explicitly formalize relations, such as Wikibase, could bring to fill needs of wiktionarian projects in term of cohesively structured data.

statusexperimental
[[|Project summary]]
in situ
Experimenting a Wikibase that responds to requirements of wiktionarian projects
creator
psychoslave
volunteer
Csisc
this project needs...
volunteer
developer
template guru
sysop
ontologist
communication faciliator
join
endorse
created on15:01, 31 May 2021 (UTC)

Rationale

edit

The various linguistic version of Wiktionary collect a lot of redundant information, that is they don't share much common information. Even in a given instance, information like a quotation or a definition on a page will be often manually be fully duplicated on an other page, and nothing will prevent divergent evolution of these duplicated data. Furthermore they are not structured in a fashion that ease querying information at a fine level granularity nor to simplify cross-referring data.

On the other hand projects like Wikidata, and Lexicographical data follow a path toward a more cohesively structured data that ease these points. But currently they don't provide much things to leverage on for tackle Wiktionary specific needs, as they are oriented to very different goals and priorities. Furthermore Wiktionary being licensed under CC-BY-SA and Wikidata using CC-0 make any significant transfer of information legally impossible from Wiktionary to Wikidata and its Lexeme extension.

Of course, Wikitionary as it is do have many conveniences, like the flexibility of structuring data through simple wikicode, templates, modules and so on. It has several solid linguistic communities with over a decade of common work and an international user group, the Tremendous Wiktionary User Group (TWUG).

No obvious simple quick path is known to get the best out of these two approaches. So this project doesn't come with any grand scheme to aim at this. Instead this project will go through little steps of experiment, gather feedback, improve, repeat.

Contributors wanted

edit

This project is specifically willing to help wiktionnarian communities, so having contributors from its different linguistic versions would be warmly welcome. A simple hello on the talk page would already be greatly appreciated, and more thorough comments are encouraged.  

We also specifically need people with:

  • skills to spread the word both within Wikimedia circles and beyond (communication facilitators)
  • will to formalize lexicological/lexicographic data models (ontologist)
  • interest in developing Mediawiki/Wikibase extensions (developers)
  • experience with Wikibase deployment and maintenance, especially of tools in Wikimedia Cloud Services (sysops)

Current focus

edit

The project currently focus at setting up a Wikibase instance on Wikimedia Cloud Services (WCS) and fill it with some quotes imported from wiktionarian projects. Quantitatively, it's not expected to go further than import a few thousand items as a high limit, if bot are to be used.

Please note that this first experiment will especially not include material such as definitions, grammatical classes, and so on. Indeed, this choice of focusing on quotations is done to make something already going on, build a team with experience in deploying and maintaining a Wikibase instance, and transfer some wiktionarian data into it. That way, the whole project won't be completely stuck with the data modeling part before anything browsable can be shown. This approach nonetheless already requires a proper model for quotations. Luckily the Structured Wikiquote project already paved the way on this regard.

Roadmap

edit

This section gather some data on what was already done and what is expected along this project

  •   Done Data gathering about possibility to host a Wikibase in WCS during Wikimedia Hackathon 2021
  • State of the art
    •   Done find if other initiative already made something around Wikibase and quotation
    •   Done fill the below See also section with related links
  • Structuring the project
    •   Done Meta page
    •   Done tweak the {{Probox}} used on this page to include this list   (template gurus)
    • Create some instant messaging room to discuss informally
  • Team building and community involvement
    • making wikimedians aware of the project
      • on wiki calls to join the project
      • spread the word on instant messaging platform and social media
        • Discord
        • Facefook
        • Telegram
        • Matrix
        • Twitter
        • Zulip
    • determine and announce needed skills and resources
  • Wikibase instance
    • deployment with required ontology to test import of quotes extracted from wiktionarian projects
  • Lexical data model
    • animate conversations around what is needed and idea to match these requirements
    • work out at least one specific proposal, build a consensual proposal, refine into a data model
    • implement the data model, deploy on the Wikibase instance
    • test the model, specify what should import data,
    • call for more tests from community
  • Assessment of obtained results, determination of next steps

Reports

edit

2021-06-25 Teleconference

edit

Csisc and psychoslave exchanged quickly some information about current possibilities to prototype something.

  • https://www.wbstack.com/ allows to launch a Mediwiki/Wikibase instance very quickly, which should be great for our first drafts
    • WBStack Telegram group has be joined and some discussion started there around in situ.
    • GreenReapder indicated that to set a specific licence and allow to import CC-BY-SA-3.0 information, it will be necessary to use the front page, sidebar and/or MediaWiki:Anonnotice/MediaWiki:Sitenotice/MediaWiki:Editnotice-[namespaceID] to announce it, rather than MediaWiki:Copyright, due to current permissions restrictions on the platform. More information is given in Allow users to alter the sidebar · Issue #52 · wbstack/mediawiki

2021-08-18 Teleconference

edit

Csisc and psychoslave exchanged 40 minutes, as an instance was finally set up for prototyping ideas around Trans Situ

  • https://trans-situ.wiki.opencura.com is the dedicated instance
  • psychoslave presented a UML user case that underlie the linguistic perspective of this project and an admittedly rather cluttered class diagram produced to support such a view, while expressing that something far more simpler should be targeted as a first prototype
  • Csisc proposed to reduce the model to two classes and will draft something in the next few days based on that idea. The two classes discussed where
    • 1. Relation
      • type:
      • subject:
      • object:
    • 2. Utterance
      • matter: text, for example "All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood."
  • psychoslave will ask what can be done to deal properly with license and switch the wiki statements to CC-by-sa instead of the current CC-0 labels

Data model proposals

edit

Several data model (ontology) and approach might be envisioned to meet requirements of in situ. Two main roads are already identified :

  • a single Wikibase instance to be used by all Wiktionary versions and other projects, hence called trans situ;
  • one Wikibase instance for each linguistic version of Wiktionary, hence called per situ.

Other approaches are still warmly welcome for now.

Notes

edit


References

edit


edit

See also

edit

Participants

edit
edit