Abstract Wikipedia/Updates/2022-11-09
◀ | Abstract Wikipedia Updates | ▶ |
Checking lexical forms
editPreviously we have discussed morphological paradigms, and how lexemes and paradigms could be used. To summarize and simplify, paradigms are patterns of inflection of a word (or lexeme), and functions can implement paradigms and specific inflections. To give an example, the usual way to get the plural of a noun in English is to add the letter s to its basic form, the so-called 'lemma'.
On Notwikilambda, the community-run preview version of Wikifunctions, we started implementing a few such functions. Correspondingly, we recreated some of them in the Wikifunctions Beta: e.g. add s to end and replace y at end with ies.
In order to demonstrate their use, we developed a small, browser-based tool, form check. Form check allows you to select a language and a part of speech (e.g. English nouns), and then state which forms you want to generate (e.g. the plural). Then you choose the function from the Wikifunctions Beta, and the tool checks whether the form as recorded in Wikidata corresponds to the output of the function.
If it doesn’t, this may indicate an error, either in the function or in the data, or an irregular form.
Form check has at least one major shortcoming, which is that it currently does not allow you to filter for further statements on the lexeme. In many languages this is crucial: for example, in German, nouns are inflected differently depending on their grammatical gender. It also doesn’t automatically update the list of available functions (but you can enter an arbitrary ZID). The code is open source, and contributions (or, indeed, someone wanting to take over the code) would be more than welcome.
It is said that it is better to show than tell. In this spirit, we created a 13 minute video. It demonstrates how the form check tool is used, how it was helpful to find an error in a lexeme on Wikidata, and how it was used to discover a paradigm and implement the respective function.
We invite you to implement more morphological functions in Wikifunctions Beta, and try them out with the form check tool. Please report errors that you find on the way, so we can fix them. And also share your results, and how well you can cover all the different linguistic variations in your language with your functions!
There are a number of interesting aspects to this demonstration.
Firstly, it shows the possible use of Wikifunctions as currently implemented for natural language-related functions. It ties in directly with the data on Wikidata, and offers both a way to find errors in the data, and also an exploration that might help with finding patterns in the data and so to create more such functions. Although I don’t speak Ukrainian, I was able to create a function that captured the morphology of a specific Ukrainian form. These functions can then, in turn, help us discover more inconsistencies, or even to enter data faster and in a way less prone to errors. For example, I would really love it if there was a way to attach functions to the fields in the Wikidata Lexeme Forms, so that I would only enter the lemma, and it would automatically fill in the other fields based on the Wikifunction's results, and then, if needed, I could manually edit the results to be correct before publishing.
Secondly, it shows how relatively easy it is to write functions, testers, and implementations. In this case, it took us less than four minutes to define the functions, write a tester, and provide an implementation. Our UX is currently being improved to make many of these steps easier and more intuitive. Not all functions will be as easy to implement. But in this case, no coding was required at all, since we had a relevant function that we could use for composition, replace at end. Our hope is that a solid library of such versatile functions can take us a long way towards pretty good coverage of morphological functions. But even if the implementation should turn out to be more complex, defining the function and providing test cases is something we expect might be possible for many potential contributors.
And thirdly, it shows, probably for the first time, an external tool calling a function from Wikifunctions (albeit Beta). It is just a website, standing in front of Wikifunctions, asking it to evaluate a function. Form check calls the SPARQL endpoint of Wikidata, and uses data from there to then to ask Wikifunctions to evaluate a function. The whole thing is a static website, needs no libraries at all, merely plain old JavaScript, and could be hosted anywhere (in fact, you can also download the HTML and load the page locally; it should work just as well).
Note that I am rather unsure whether the Form check tool is a good and useful tool. Do we really need to check thousands of forms by each individual user? We would probably want a shared resource for doing this evaluation instead. The tool is meant as an early inspiration that will hopefully lead to other tools, libraries and workflows, which are more robust, reusable, and are closely aligned with how the community works.
Volunteer’s corner
editThanks to everyone who joined the volunteer’s corner on Monday. It was lively. Thanks to all who attended! The next will be on Monday, December 5 at 18:30 UTC.
WikiConference North America 2022
editThis weekend Wikifunctions will be presented at the WikiConference North America, jointly held with OpenStreetMaps USA. The presentation will be on Saturday, November 12 at 20:15 UTC, and we will focus on Wikifunctions and possible use cases in the world of maps.
Staff editing policy
editWe are, for now, closing the hot phase of the staff editing policy. The policy belongs to the community, and can always be evolved and adapted by you. We will, on launch, copy it over to Wikifunctions, and will follow this policy.
Development updates
editExperience & Performance:
- Fixed more FE bugs
- Merged patches related to error management
- Made great progress on drafting the Default Component technical specs
Meta-data:
- Completed readable summaries of all error types, and ability to record which implementation gets selected (T312611, T320457)
Natural Language Generation:
- Finalized template language document
- More analysis on dependencies for isiZulu