Lingua Libre/2022 wishlist
- Note: This page have been moved to Meta where the Visual Editor is usable and table are easier to maintain (Edit page with Visual Editor).
In late March 2022, Wikimedia France is establishing its budget for the period of July 2022 to June 2023.
Please share here what you think we should get done this year on Lingua Libre. Feel free to add projects of yours that would require funding, as well as bugs and foreseeable technical needs.
Please remember to link Phabricator tickets to the bugs and technical issues you raise. A maximum of 10 suggestions per person would be best.
Approach
edit- Editorial vision for this raw wishlist
Phase 1, clarify needs and assess priorities: digest the raw wishlist into a clean document usable to strategize efforts.
- Strategic awareness: get several members to read, understood this full list and acquired global vision 👉🏼 Yug, Poslovitch.
- Clean up: clarify and harmonize these raw texts; divide vertically into identifiable missions and skill-based clusters ; divide horizontally into distinct and non-overlapping topics.
- Evaluate: estimate workload, cost, importance, urgency, priority.
- Map missions: keep an updated DAG map of those missions.
Phase 2, outreach documents: create derivative documents with selected missions, written in formats suitable for target key publics such as
- board members (financial decision makers) : focus on strategic importance, wished outcome, cost estimates.
- developers team: providing strategic importance, time estimate, access point to start coding.
- marketing team: providing strategic importance, time estimate, direction to start document writing and campaigns.
- Note on estimates
- We assume access to skilled developers at affordable price : 250€/day, 1,250€/week, 5,000€/month. Single day sprint at 500€. Lack of such skillset-tarif would affect the project.
- Project management coordination, technical requirements writing, call for hires, administrative work, intermediate and final review and testing are not included. Expect about 1 human year, 40k€.
- Outreach to local minorities for field linguistic cost is not present.
- Occasional travel, hosting, meal costs are not included. Expect 10k€.
- Contextual dates and deadlines
To keep synchronized with lingualibre:Lingualibre:Events/Program.
- Done 2022.03.11: Adelaide initiated the 2022-2023 Lingualibre wishlist.
- Done 2022.06.01 (event): Updated wishlist shared, call for improvement.
- Done 2022.06.05 (deadline): Celtic Knot <5mn prerecorded video.
- Done 2022.06.12 (deadline): Wikimania 2022 submition deadline. See possible formats.
- Done 2022.06.19 (event): Wider public forum in Toulouse of local communities associations with Lingualibre's stand and our Occitan Gascon team Lo Congres.
- Done 2022.06.21-23 (event): LREC / Languages Resources and Evaluation Conference in Marseille. Opportunity to share our vision with researchers of language resources.
- Done 2022.07.01-02 (event): Celtic Knot Conference 2022, for “communities working on a minority language on the Wikimedia projects”, See requested format
- [ 2022.07.xx (event idea): several outreach and recording sessions at INALCO ]
- Done 2022.07.08-10 (event): Wikimedia France's Wikicamp. Opportunity to share our vision with WMFr's community.
- 2022.08.03-06? (event, W/T/F/S/Sunday): Wikimedia France Hackathon.
- 2022.08.11-14 (event, T/F/S/Sunday): Wikimania Online. Opportunity to share our vision with the global WMF's community.
- 2022.09.xx (deadline): Wikimedia Foundation community fund, Alliance fund applications deadlines. See {template:Grants table}.
- 2022.11.19-20 (event): WikiConvention, Paris.
Wishlist
edit- See also :phabricator:LinguaLibre.
Section 1 : RecordWizard
editSubmitted by | Definition & Evaluation | Estimated costs | ||||
---|---|---|---|---|---|---|
User | Project | Title | Description | Priority | Time | Budget |
RecordWizard | ||||||
0x010C | RecordWizard | Critical fixes | Mitigation of several major bugs : audio clicks… | ★★★★★ | ~1 month | 5,000€ |
0x010C
Yug |
RecordWizard | Sharable click-and-record link :phab:T313575 | Sharable RecordWizard URLs with parameters to pass settings such as locutor (Qid), language recorded (Qid), local wordlist used (title), etc. as URL parameters to prefill RW's form. Motivation: This allows experienced user to send non tech-literate speakers a click-and-record link. | ★★★☆☆ medium | ~2 weeks | 2500€ |
0x010C | RecordWizard | Enhance the Tutorial step :phab:T266843 | ? | ★★☆☆ | ~2 weeks | 2500€ |
Yug | Audio data | Investigate Click bug T281041 | Review recent users' recording to properly assess prevalence of audio defects. See also Property:P33 `type of audio file issue`. Need to hand review at least 200 files. | ★★★★★ | 1 week | 1,250€ |
0x010C | RecordWizard | Automatic audio quality check T290010 | ? | ? | ~1 month | 5,000€ |
0x010C | RecordWizard | Automatic audio quality tagging :phab:T303680 | ? | ? | ~1 month | 5,000€ |
Yug | RecordWizard | RecordWizard working offline :phab:T313574 | Ability for the RecordWizard to operate offline | ★★★★☆ high | ~2 month | 10,000€ |
List loader | ||||||
Yug | RecordWizard | List loader handles dictionary (github):phab:T212671 | Marginalized communities with no wordlist available requires the creation of a minimalist bilingual dictionaries, translated from local macro language into our target minority language, in order to create that fist wordlist. The list loader could easily be resilient to load such bilingual dictionary. Format is # L1 → L2 , see also Help:List translation.
|
★★★★☆ high | 1 day | 500€ |
Yug | RecordWizard | List loader handle metadata (github) | Format for metadata to think. Ex: # rouge [pos:noun;french:mot;ipa:/ɹuːʒ/;…] . This has deeper implications. It requires to be human and machine editable, as it could be a place to allow humans to create dictionaries and machine readable data to wikidata lexeme. See also Handedict (ask Yug).
|
★★★☆☆ medium | 1 day | 500€ |
Yug | RecordWizard | List loader handle HTML comments, wiki <noinclude> (github):phab:T212671
|
HTML comments, noinclude contents is automatically removed. | ★★☆☆☆ medium | 1 day | 500€ |
Poslovitch | RecordWizard | List loader filters lists by list type. :phab:T313478 | List types: Frequent words lists ; Never recorded words lists ; Thematic lists ; Requested by community.
User-speaker can pick from tickable list what they want to focus on, and be suggested the right wordlists. |
★★★☆☆ medium (sugar) | ~1 month | 5,000€ |
Yug | RecordWizard | List loader has priority system :phab:T313500 | List loader can discriminate higher quality lists for a language. | ★★★☆☆ medium | 1 week | 1,250€ |
Rdrg109 | RecordWizard | List generator > Based on Lexemes without audios :phab:T283802 | Create a list generator based on Wikidata lexeme's words and sentences (Property:P5831 `usage example`) with no Property:P443 `audio pronunciation`. Note: On 2022/03/18, only 1 usage example has a pronunciation audio ; of the 129,942 English forms, only 340 have pronunciation audios (i.e. ~0.26%). More statistics here, discussion here. Note: In RW, step 3, the External Tools list loader could provide few built-in examples, including this one. Implies JS, OO.ui.js skills. | ★★★☆☆ medium | 1 week | 1,250€ |
Section 2 : MediaWiki
editSubmitted by | Definition & evaluation | Estimated costs | ||||
---|---|---|---|---|---|---|
User | Project | Title | Description | Priority | Time | Budget |
MediaWiki maintenance | ||||||
0x010C | MediaWiki | Local MediaWiki enhancements | Lingualibre's MediaWiki can be enhance's for better user experience : the main search bar, Special pages and wikicode-editing UI (Special:Search, Special:RecentChanges,...). Better UX would increase user retention. | ? | 1.5 month | 8,000€ |
Poslovitch | MediaWiki | Update/Upgrade 1.35.5 | MediaWiki 1.35 requires few security upgrades. The next LTS version (1.39) is expected on November 2022, which is recommended to keep up to date, compatible with MediaWiki extensions, and to keep our site and users safe. Upgrades also requires numerous small correction to LinguaLibre's core RecordWizard extensions. | ? | 2 months | 20,000€ |
Poslovitch | MediaWiki | Extension install: MLEB | MediaWiki Language Extension Bundle is a pack of extensions that should be updated "as a group" and not individually (and attempting to do so in December did not yield any success). As brought by T295250, updating the MLEB would allow the use of a "tvar" syntax (which I'm unfamiliar with) | ★★★☆☆ medium | 1 week | 1,250€ |
Yug | MediaWiki | Extension : Translate | Translate extention to update. | ★★☆☆☆ | 1 day | 500€ |
Yug | MediaWiki | Extension install : Visual Editor | Visual Editors would help to co-edit wordlists and documents with elder or less computer-educated collaborators. Field work has shown this demographic is over-represented among minority and endangered languages speakers willing to contribute their voice and lexical knowledge to Wikimedia. | ★★★☆☆ medium | 3 days | 750€ |
Yug | MediaWiki | Extension install : Template Styles | Template style would ease creation and maintenance of stylized templates, most notably navbox. This need arose recently | ★★☆☆☆ | 3 days | 750€ |
Yug | MediaWiki | Extension creation : Languages gallery | Create extension based on CommonVoices > Languages gallery https://commonvoice.mozilla.org/en/languages (Mozilla Public License) | ★★☆☆☆ | 2 weeks | 2,500€ |
Poslovitch | Wikibase | Database performance improvement :phab:T312537 & ... | The SPARQL endpoint is unpractically slow, which makes the current Sound library non-functional (too slow). Performance must be improved 100 fold. Making an intermediary duplicated database could solve this strategic weak point. | ★★★★☆ high | 1month | 5,000€ |
0x010C | Search engine / Sound library | Responsive search engine / gallery :phab:T252321 | Provide a proper, time responsive search engine in order to showcast our audio voices riches and attrack larger public. | ★★★★★ high (critical) | ~3 months | 15,000€ |
Yug | Search engine / Sound library | Learning management system (basic) | Provide a sustained personal learning experience. Visitor can favorite recorded words to add those to its personal learning list. Words have self-assessment system `saved/learning/mastered`. Motivation: endangered languages communities and general language learners repeatedly asked for learning tools, for language revitalization or language learning. Providing such added value will attract those speakers and learners. | ★★★☆☆ medium | 2 months | 10,000€ |
Marreromarco | Search engine / Sound library | Learning management system (full) | Provide a solid and elegant words-oriented and user-centered learning experience. Motivation: Same as above. This model is followed successfully by for-profit competitor Forvo. Failing to make this move, Lingualibre will stay a nerd and wikimedians only tool, therefore willfully choosing to fail its scale up and outreach. | ★★★★★ high (critical) | 6 months | 30,000€ |
Section 3 : others
editSubmitted by | Definition & evaluation | Estimated costs | ||||
---|---|---|---|---|---|---|
User | Project | Title | Description | Priority | Time | Budget |
Various | ||||||
0x010C | RecordWizard / Bot / Dedicated Extension | Mass edit tools | Tools to help experienced users do maintenance tasks: patrolling audios, batch-editing recording elements, batch-importing records… | ★★★☆☆ medium | ~3 months | 15000 € |
Poslovitch | UI > Dataset page | Datasets page revamp to be elegant. :phab:T313572 | The Datasets index is unsightly while displaying our whole and valuable output, to be re-used. Revamp is necessary. See competition https://commonvoice.mozilla.org/fr/datasets | ★★★☆☆ medium | 1 week | 400€ ? |
Yug | UI > Languages gallery | Languages gallery page, elegant. :phab:T313397 | Languages statistics should be queried (slow), then copied and stored, periodically. Some elegant HTML, CSS should then generate via JS a full language page, if possible with filter feature by language and ISO (VueJS or VanillaJS recommended). See competition https://commonvoice.mozilla.org/en/languages | ★★★☆☆ medium | 2 weeks | 2,500€ |
Marreromarco | UI > Request pronunciations form | Form for Requested pronunciation notifies Native speakers | Provide a form to submit words requests in a given language. Words are appended to a [[List:{iso}/Requested by community]]. Notifies volunteer native speakers. Motivation: It is very useful for language learners to request the specific word/phrase in which they have doubts about the Pronunciation. Forvo allows such function and users make very creative requests. It is also helpful specially for technical terms and proper names | ★☆☆☆☆ low | 2 weeks | 2,500€ |
Poslovitch | Docker | Create a proper development environment :phab:T313573 | Create a proper Integrated Development Environment (IDE) for the PHP, JS/VUEJS, CSS, HTML stack used by mediawiki. Such tools are central to 1) allow developers rapid diving into MediaWiki and MediaWiki extensions' codes; and 2) allow volunteer developper to ensure changes to the RecordWizard and other extensions do not risk to cause issues downstream. Motivation: since 2021, volunteer developers have insistently attempted to create such tool without success. | ? | 1 month | 5,000€ |
Poslovitch | Various | Implement the Lists suggestions from July 2021's Hackathon | Ideas from July 2021's Hackathon would improve the UX for lists and improve their discoverability | ? | 3 months | Unknown |
Outreach | ||||||
Marreromarco | PR > General Public Relations Campaign > Blogs | Promote LinguaLibre via posts | An underused avenue to promote the project is to write posts on blogs, social media, magazines, newspapers, create YouTube videos, etc. LinguaLibre now has notable and peculiar stories like for Gascon (2019), Cantonese (2021), Sicilian (2022), Surui (2022), which could be shared more broadly. A PR Campaign is necessary in 2022-2023 to increase the number of active contributors and become a viable FOSS alternative to Forvo. | ★★★★☆ high | 6 months | 6,000€ (Internship) |
Marreromarco | PR > General Public Relations Campaign > As learning tool | Promote LinguaLibre as a learning tool | Promote the website to attract language learners, with invitations to contribute their voices on missing languages. | |||
Marreromarco | PR > "Month of Voices" | Lobby for a "Month of Voices" | Propose to Wikimedia Headquarters the development of a "Month of Voices" in which LinguaLibre would be promoted on Wikipedia Articles in the Section of "Languages" at the left side of the Main Page. The idea was discussed previously: LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign. | ? | 6 months | 6,000€ (Internship) |
Peripherical projects (not our stack) | ||||||
Marreromarco | ? > Anki Integration with LinguaLibre | An Anki Add-on would be helpful for language learners | ★☆☆☆☆ low (?) | 1 month | 5,000€ | |
Languageseeker | Wikidata Lexeme, bot. | Pull common linguistical data from Wiktionaries to Wikidata | Some linguistical data (part of speech, pronunciation, conjugation, etc) is universal and would be useful to be able pull from Wikidata. However, most of it is currently manually entered on Wiktionaries. This would pull these common bits into Wikidata. Part of this project would involve developing a system for representing linguistical data in Wikidata. It will enable the disambiguation of heteronyms. | ★☆☆☆☆ Out of scope | 3 months | 15,000€ |
Rdrg109 | External plugin ? > gather sentences (?) | Extracting sentences from any audio stream for their inclusion in Lingua Libre. | Each extracted audio would correspond to a sentence. Each sentence could be added to lexemes as a "usage example". Having usage examples with pronunciation audios makes Wikidata lexicographical data more useful. With SPARQL, we could then answer questions of the style: Usage examples with pronunciation audios that were retrieved from interviews where the participant is a native speaker of that language. More information about this idea in this page. | ★☆☆☆☆ out of scope ? | 3 months | Unknown (I have little experience with MediaWiki development so it will be more of a learning experience) |
Yug | Unilex | Revive UNILEX data gathering | UNILEX is an open license, Google's one shoot project who scrapped the internet via basic python scripts to build frequent words lists in 1001 languages. The technology used is basic and efficient. Wikimedia could help crowdsource this project's websites index, in order to provide more languages and with better wordlists. This would support field lexicography workshops for minority languages. See also Lingualibre:Events/2021 UNILEX-Lingualibre. | ★☆☆☆☆ out of scope ? | 1 week | 1250€ Depends on ambition. |
Directed Acyclic Graph
edit- This is an exploratory work.
Submit additional ideas
editAdd below to submit an additional wish.
- Improve the audio review system.
- Outreach > Present Lingualibre to WMfr as strategic for diversity, revitalization.
- Outreach > Fundrising
Exploring wikibase migration
editThe following contents are identified as requiring specific migration efforts
- Wikibase : records -- Commons wikibase
- Wikibase : languages -- Wikidata wikibase
- Wikibase : speakers -- ?
- Wikibase : properties -- ?
- Mediawiki : wikipages (Main, Help) -- Commons project space
- Mediawiki : Lists (editable, MIT-like license) -- ?
- Mediawiki : js scripts lingualibre:MediaWiki:Common.js -- delete or Lingualibre.org pure web website
- Mediawiki : LTR and RTL support
- Mediawiki : login system / Oauth
- Services (= menu) -- Lingualibre.org pure web website