Small Wikipedia Community Sustainability/Wikidata

Want a bright future for your nation and its culture? Take it into your own hands — start contributing into Wikipedia in respective tongue and teaching others!

Modern language communicative value

Presentation Slides (EN) and draft presentation text
Artificial life support of whatever unique cultural phenomenon and/or languages and worldviews relied on when it came into being or was later in active use, without proven utility of the latter for addressing current life's practical tasks is no more valuable than worshipping cuneiform or any other historic technology.

— Farhad Fatkullin[1]

Imagine a world, where all public and private services, as well as all digital documents are immediately and permanently accessible in all languages of the planet, and one can similarly interact with any other person or any other global Internet of things object. Semantic Web technology is making this real!
Any language structure is a set of various Lexicographical data types.
Wikifunctions will help making any communication language-independent.

This work has matured in Wikimedia Languages of Russia Community volunteers' internal discussions when taking part in both Wikimedia Russia-initiated and global Wikimedia Language Diversity projects. It is summarized and published by Farhad Fatkullin (Kazan, Russia) with special thanks to Renat Shigapov (Germnay) for help with WikibaseCirrusSearch used in data collection, and Paul Kaganer (Saint Petersburg, Russia) for recommendations and support in choosing the topic, critical feedback in the process and proposals for further development of the analysis. First publication and material presentation is planned to take place in Tatar as part of 2nd Russia-wide "Language, Society and Information Technologies" Scientific and Practical Conference (17-18 Feb. 2023).

The following is a proposal to use Quantitative assessment method for evaluating the amount of work necessary to sustainably support any of the languages of Russia at a hypothetical digitally human-stationary orbit. The state of the languages is evaluated using absolute and relative data on Wikidata knowledge base elements' labels and descriptions in respective natural language, as well as lexicographical data used to describe various existing Wiki-functions depicted relationships between them. Analytical Tables below are filled by both statistical data and calculated shares per moment of last query, will be periodically updated.

Thesis

edit

A culture is a non-genetic mean of transferring information,[2] a language is a communicative protocol used within respective unique cultural environment,[3] language speakers (users) are information creation, operative storage and interchange nodes, whilst ongoing interaction of language users is a distributed data processing when generating new knowledge, reorganizing society structure or transformation of its essence (changing approaches in its interaction with the surrounding physico-biological world).[4]

Within such a model, an instrumental function of a standardized working language is in assuring communication speed and precision. In Knowledge-based economy era any subject, object and/or an information product therein is simultaneously participating in a multitude of parallel processes. Long-term viability of a separate natural language (and related cultural knowledge) depends on the level of its support and usage effectiveness of the tool within ever-changing technologic and social environment.[5]

An individual or a community of language speakers will keep investing into preserving respective linguistic competencies, as well as simultaneous continuous development of the communicative protocol in question for as long as it can be effectively used within global ecosystem of added-value creation (contribution into global GDP).[6] Thus long-term global multilingualism preservation is dependent on humanity's transitioning towards technologies allowing language equality within global division of labour based economic cooperation systems.[7].

This is exactly the founding pillar of language-independent Semantic Web (Web 3.0) technology. These opportunities have already been successfully demonstrated:

  • since 2012 — Wikidata knowledge base (and other semi-structured data repositories based on independent Wikibase and similar installations, as well as generation of knowledge blocks and demonstration thereof in a target language),
  • 2016-2018 transitioning of language description into Lexicographical data format (becoming more and more popular, likely to become a foundation of European language equality, including state and municipal services, etc., aimed to be reached by 2030), and a
  • 2020-2023 launch of Wikifunctions, relying on the achievements above (recent 6 language limited pilot has been successfully completed, now getting ready for full production launch).

Unlike other Wikimedia projects, these are published and distributed under the highest degree of legal freedom license (Creative Commons Zero, CC0 - Public Domain). All those interested to contribute into providing reliable long-term sustainability of your favourite languages are invited to join volunteers developing their presence in the systems above!

Statistical Tables

edit

Tables below (except for the comparative example) include languages which satisfy at least one of the two conditions:

Attention: * marks languages with official status in some administrative-territorial entity of Russia, with majority of its speakers residing outside the Federation's boundaries.

Comparative Analysis Source Data & Example Table

edit
Language
[[MainPage]]
Code
[[Category:]]
Labels
number
Labels
%
Descriptions
number
Descriptions
%
Lexemes
number
Forms
number
Sense
number
English en 86534411 85 84304578 82.8 72561 132029 30160
Arabic ar 6361419 6.2 51285699 50.4 1384 310 136
Spanish es 20974374 20.6 43824405 43 30480 352459 10425
Chinese zh 6317996 6.2 34690543 34.1 4322 4458 3995
Russian ru 8775094 8.6 39399582 38.7 101554 1238078 12833
French fr 21394767 21 50751745 49.8 19271 325157 9482
German de 21147425 20.8 65515077 64.3 213398 550155 10269
Turkish tr 1980389 1.9 33405809 32.8 225 188 189
Hindi hi 1185205 1.2 8194601 8 1407 2430 1892
Farsi fa 2773491 2.7 15532782 15.3 6768 11417 7425

Russian Federation languages with active Wikipedias (34)

edit
Group Language
[[MainPage]]
Code
[[Category:]]
Labels
number
Labels
%
Descriptions
number
Descriptions
%
Lexemes
number
Forms
number
Sense
number
East Slavic (2) Russian ru 8775094 8.62 39399582 38.68 101554 1238078 12833
Ukrainian* uk 6093531 5.98 59582944 58.5 16258 507956 283
Turkic (10) Altai alt 2247 0 10 0 1 0 1
Azerbaijani* az 1041742 1.02 1007804 0.99 28 22 22
Bashkir ba 594414 0.58 6570985 6.45 16 3 14
Crimean Tatar crh 398457 0.39 530708 0.52 13 10 15
Chuvash cv 727163 0.71 786208 0.77 11 1 11
Kazakh* kk 936517 0.92 929021 0.91 20 2 18
Karachay-Balkar krc 456559 0.45 102963 0.1 2 0 2
Sakha (Yakut) sah 475916 0.47 240898 0.24 8 0 8
Tatar tt 980788 0.96 6509824 6.39 8 0 8
Tuvan tyv 455568 0.45 94971 0.09 5 0 5
Mongolic (2) Buryat bxr 457138 0.45 142643 0.14 3 0 3
Kalmyk xal 455456 0.45 144964 0.14 12 2 14
Indo-European (3) Pontic Greek pnt 453735 0.45 144974 0.14 1 0 1
Ossetian os 685107 0.67 808088 0.79 7 1 7
Yiddish* yi 677620 0.67 2889421 2.84 276 704 330
Northeast Caucasian (5) Avar av 458950 0.45 145011 0.14 6 0 6
Chechen ce 990769 0.97 1186533 1.16 10 1 10
Ingush inh 456006 0.45 6984 0.01 4 0 4
Lak lbe 452758 0.44 2 0 5 0 5
Lezgian lez 458967 0.45 103329 0.1 6 0 6
Northwest Caucasian (2) Kabardian kbd 455580 0.45 28 0 7 1 7
Adyghe ady 451459 0.44 13 0 7 10 7
Finno-Ugric (10) Finnish* fi 7274103 7.14 34355848 33.73 636 8383 569
Permyak koi 458804 0.45 145570 0.14 6 0 6
Komi kv 461827 0.45 344 0 28 36 24
Moksha mdf 457818 0.45 181892 0.18 3 0 3
Meadow-Eastern Mari mhr 669640 0.66 679351 0.67 3 1 3
Hill Mari mrj 464752 0.46 7 0 4 0 4
Erzya myv 494156 0.49 31494 0.03 10 2 9
Livvi-Karelian olo 469880 0.46 20 0 2 0 2
Udmurt udm 459479 0.45 193 0 7 0 7
Veps vep 517757 0.51 31895 0.03 15 34 17

Russian Federation languages with Wikipedias in incubator (45)

edit
Group Language
[[MainPage]]
Code
[[Category:]]
Labels
number
Labels
%
Descriptions
number
Descriptions
%
Lexemes
number
Forms
number
Sense
number
Turkic (10) Northern Altai atv 0 0 0 0
Shor cjs 0 0 0 0
Chulym clw 0 0 0 0
Dolgan dlg 0 0 0 0
Krymchak jct 0 0 0 0
Karaim kdr 0 0 0 0
Khakas kjh 0 0 0 0
Kumyk kum 153455 0.1507 1 0 1 0 1
Nogai nog 0 0 0 0
Siberian Tatar sty 153501 0.1507 14 0 1 0 1
Tungusic (3) Even eve 0 0 0 0
Evenki evn 0 0 0 0
Nanai gld 1 0 0 0 3 0 3
Indo-European (2) Judeo-Tat (Juhuri) jdt 0 0 0 0
Tat ttt 0 0 0 0
Northeast Caucasian (10) Aghul agx 0 0 0 0
Andi ani 0 0 0 0
Tsez ddo 0 0 0 0
Dargwa dar 0 0 0 0
Rutul rut 0 0 0 0
Tabasaran tab 0 0 0 0
Tindi tin 0 0 0 0
Tsakhur tkr 0 0 0 0
Udi udi 0 0 0 0
Abaza abq 0 0 0 0
Finno-Ugric (8) Ingrian izh 0 0 0 0
Khanty kca 0 0 0 0
Karelian krl 455915 0.4476 0 0 7 0 3
Livonian liv 454502 0.4462 0 0 7 0 7
Mansi mns 0 0 0 0
Kildin Sámi sjd 155184 0.1524 9 0 2 0 2
Ter Sámi sjt 0 0 0 0
Votic vot 481454 0.4727 0 0 4 0 3
Samoyedic (4) Enets enf 0 0 0 0
Nganasan nio 0 0 0 0
Selkup sel 0 0 0 0
Nenets yrk 0 0 0 0
Paleosiberian (7) Alyutor alr 0 0 0 0
Chukchi ckt 0 0 0 0
Itelmen itl 0 0 0 0
Ket ket 0 0 0 0
Koryak kpy 0 0 0 0
Nivkh niv 0 0 0 0
Southern Yukaghir yux 0 0 0 0
Eskaleut (1) Chaplino dialect ess 0 0 0 0

References

edit
  1. Farhad Fatkullin Wikipedias in the languges of Russia today and tomorrow: why and how?, Finno-Ugric Wikiseminar, Petrozavodsk, 6-9 May 2016
  2. Alexander V. Markov in http://amp.gs/j8d9b per Frans de Waal, 2007
  3. Gainulla F. Shaykhiev. The Language of Reason. We think in Tatar, Russian, English, ... simultaneously... Kazan: 2000 (Russian) [Jazyk Razuma. My dumaem i po-tatarski, i po-russki, i po-angliyski...]
  4. Farhad Fatkullin: Mythic consciousness and the intangible cultural environment. Empirical analysis using multilingual Wikipedia materials readership structure and statistics. 1st Russia-wide "Language, Society and Information Technologies" Scientific and Practical Conference (19-20.02.2022). (Tatar) Provisions (Russian) Information Letter (Russian) Program (Russian)
  5. Farhad Fatkullin: Tatar Language and Culture Digital Sustainability Ecosystem. Hows and Whys. 8th Tatar Language and Literature Teachers Russia-wide Convention. Roundtable of the «Multicultural Education as a Factor in Developing Child's Identity and Ethnic Self-Counsciousness» Section, 28th June 2022.
  6. Farhad Fatkullin. Digital Ecosystem for Tatarstan's Linguistic and Cultural Diversity Sustainability. What, why and how?. «Preservation and Development of Native Tongues within a Multiethnic State: Language Policy, Challenges and Prospects» Interregional Scientific-Practical Conference (co-organized by UNESCO Information for All Program Committee for Russia, Russia's Federal Agency for Ethnic Affairs, etc.), «Role of ICT in Language Preservation» section, Kazan, Republic of Tatarstan Academy of Sciences Sh.Mardjani Institute of History, 21.06.2022.
  7. Farhad Fatkullin. Wikipedias as language speakers community cultural transformation catalyst. Wiki-Conference Russia 2022.