Talk:Wikispeech

Latest comment: 1 month ago by Prototyperspective in topic Does/will this match modern AI voice quality?
This page has earlier discussions at mw:Talk:Wikispeech

Redirect

edit

I think the soft-redirect was a better idea, to reduce duplication. Information on this project is already sparse enough. Nemo 19:49, 14 August 2016 (UTC)Reply

Gujarati language text to speech

edit

Blue Rasberry (talk) 17:26, 21 December 2020 (UTC)Reply

Thanks for notice Bluerasberry! Eric Luth (WMSE) (talk) 16:23, 15 January 2021 (UTC)Reply

Mozilla Common Voice

edit

Mozilla is developing a TTS called Mozilla Commonvoice with crowdsourced speech data, is this project related? Do they share the speech data? They also have an Android app for users to provide data more easily, could it be reused for Wikispeech? --62.98.122.121 13:26, 21 March 2021 (UTC)Reply

Common Voice is a project made with the aim of gathering as many audio recordings as possible, in order to have a massive and representative dataset (for each language). Mozilla uses Common Voice's data to train their second NLP project, called DeepSpeech, a STT engine. I'm not aware of any TTS project by Mozilla for the moment. Regards — WikiLucas (đŸ–‹ïž) 17:40, 21 March 2021 (UTC)Reply
Hello @WikiLucas00:, thank you for the feedback and clarification on what Mozilla already have launched. I fail to determine if there is no interest for their database in the frame of Wikispeech. Maybe Mozilla is not focussing on a text to speech engine, but it doesn't prove their data useless. They gather many sentences that are pronounced by many people. That sounds like a great data set to train some AI, including for text to speech, doesn't it? Psychoslave (talk) 14:57, 26 March 2021 (UTC)Reply

Some feedback and question

edit

Hello everybody,

I discover this great project only now, thanks to @Denny who sent a message on the Telegram canal.

The FAQ say that you had the opportunity to do a thorough investigative study before [you] started. Is there a report associated with this study? I think it could help people interested in the topic to better understand the background.

It also indicates that you consult Disability organizations in Sweden. That seems a very good point for the project success. It makes me ask wonder, are there already people with disabilities directly involved in the project? If that is not the case, is that already planned?

The project seems to want to address two main public, people with disabilities, and people with illiteracy issues. For the later one, I think a complementary approach would be to make the Wikimedia environment a host of useful tools and pedagogic material to learn (and teach) how to read. Are you aware of such a complementary project?

For now, if I well understand, there is no platform to record and look at collected material, is that correct?

The pages of the project speak several time of existing commercial solutions, but I didn't saw any statement about existing FLOSS solution, like Orca. I guess you already know about it, but it's better to make sure.   Maybe you even had some assessments on it that conducted you to ignore it completely as far as this project is concerned. Or maybe you actually plane to collaborate with Orca’s team, so you could help each other to improve your solution. I didn't find any information that would let me know if any of this hypothesis might approach reality. Could you light me on that point please? 🙏

I see in the video presentation that @Lyokoï: — or someone who really looks close to him 😂 — followed the session. So I guess that you already know Lingua Libre (LL). Actually, updating the Wikispeech/Wikispeech_2019 page, I see that @Sebastian Berlin (WMSE): added today that you did know about that, as well as about Mozilla and the Commons Voice (CV) project. So all that sounds like awareness of of potential partnerships is good on these sides.

I understand that the overall project has its own well defined goals that are very different from LL and CV. I think I simply miss a good perspective of the project to better understand this point, as I fail to see how the specific sub-project of a new collecting platform would be indispensable here. What significant different features would it provide, that would make the new platform development cost worthwile, as opposed to adapt/reuse things from CV or LL?

Last question, the project doesn't specify any license. It just seems to use "freely licensed speech data" wherever the topic is approached. LL is CC-by-sa-3.0 if I remember correctly, and CV is CC-0. What about this project?


So in a nutshell, here are my questions:

  1. Is there a report associated with this study?
  2. are there already people with disabilities directly involved in the project, or a plane to do so?
  3. Are you aware of a complementary wikimedia project to help people to learn basic reading skills?
  4. Is there already a platform to record and look at collected material?
  5. What is your point of view on Orca or any other FLOSS TTS solution?
  6. What features would be better implemented in a bright new project rather than integrated in existing platforms such as CV and LL?
  7. Do you already decided which license will cover the collected material?

Thanks for all that you already accomplished, and thanks in advance for your reply. Psychoslave (talk) 16:49, 26 March 2021 (UTC)Reply

Hi @Psychoslave. Firstly huge apologies for not noticing this question before. Despite it being a fair bit later I still hope the answers will be useful.
1. The report of the pilot study for the speech data collector can be found at wmse:Fil:Bilaga – Rapport för Wikispeech taldatainsamlarens förstudie.pdf (in Swedish). The report for the pilot study for the text-to-speech component can be found at wmse:Fil:Wikispeech - Bilaga 1 Huvudrapport.pdf (in Swedish).
2. There is currently a request for funding being considered around this. Should that receive funding then there is a plan for a person with visual disabilities to be directly involved in the project.
3. I am not aware of such a complimentary approach.
4. There is no platform available to test yet. for speech recording. The text-to-speech platform can be used on Wikipedia today through an on-wiki Gadget (on Swedish Wikipedia) and the underlying script could be copied across to English or Arabic Wikipedia. See e.g. mw:Help:Extension:Wikispeech#As_gadget_or_user_script.
5. Orca was not explicitly on our radar. This is because our initial approach is that we explicitly wanted a solution which did not require the user to install any software on their end. In part because such an approach is a limitation on users who primarily access our sites through mobile devices (with limited software support) or through computers where they cannot instal software (e.g. library computers). In the backend Wikispeech makes use of pre-existing FLOSS TTS-solutions and it's built to be able to swap these out as new solutions arise. Currently the default TTS is MaryTTS, but one aspect of the project currently seeking funding is to look at switching this out for more modern solutions.
6. We have been in contact with both LL and CV during the last project (but not directly since then). Looking at both what source of resources they are trying to gather and how they have approached it. Neither of those projects are static of course but I'll try to outline the main differences as they where when last we looked at this.
  • CV: In the case of CV the types of recordings that they are interested in doing are primarily for voice recognition. As such they are seeking short recordings which can happily include background disturbances and poorer audio quality. This is great for training voice recognition but not as much for producing a text-to-speech voice. The conclusion here was however that any data produced by the Speech Data Collector should also be exportable for use by the CV-team.
  • LL: The LL platform was primarily aimed at short recordings of individual words and names. For text-to-speech training this is valuable in that it serves as a basis for determining the pronunciation of individual words. To train a voice however it is less useful since the transition between words is important. Even if LL allows you to record sentences the plan is for the Speech Data Collector to also add annotations to the speech (to assist the training) and to use manuscripts specifically designed to be speech data dense (so that fewer recorded sentences are needed to train a new voice).
7. The license for the collected speech data (and any lexicographical data) will be CC0.
/ André Costa (WMSE) (talk) 08:32, 7 November 2023 (UTC)Reply

Status?

edit

What is the status of the project? I don't see any dates in the timeline. It seems rather dead to me. So9q (talk) 16:08, 9 October 2023 (UTC)Reply

Hi. Thanks for getting in touch. Wikimedia Sverige has been working on the code in a more limited capacity the last years. Main focus has been addressing issues raised by users on Swedish Wikipedia (where it is available as a Gadget), maintaining compatibility with MediaWiki (as deployed on Wikipedia) and building out the functionality for improving the pronunciations (editing the lexicon). We have sent in a funding application which, if successful, should allow us to focus on Wikispeech again, including a much needed focus on making it easier to add support for new languages and adding new voices. /André Costa (WMSE) (talk) 13:25, 12 October 2023 (UTC)Reply
Thanks for the update. So9q (talk) 09:06, 13 October 2023 (UTC)Reply
Hello User:André Costa (WMSE), is there a link to share for that funding request ? Yug (talk)

Wikimedia text2speech collaborations

edit

Hello WikiSpeech,

I sent the following email to Ona de Gibert[1]:

Email 1 :

Hello User:Babeliona,

I saw your presentation at the Wikimania[1] where you mention you are part of the University of Helsinki working on OPUS MT models, and in which you layed out the general state of open technologies and minority languages. Thank you a lot for this structuring review and mapping of our field.

I'm Hugo Lopez, known as User:Yug, previously wikimedian in residence in Toulouse University. I organized the Wikimania round table « Supporting minority languages and the Wikimedia community ».[2] Our aim was to increase awareness within the larger Wikimedian community.

The project I focus on is Lingualibre.org[3], a free Wikimedia online tool to rapidly record words or sentences in any existing language. The ability to reuse written sentences to rapidly create 1,000s clean audios recordings can be an importance piece (namely, the training data) for machine learning researchers working on text2speech models.

Having this tool to create text+audio pairs and with the need for minority languages text2speech models, in 2024-2025, we Wikimedians want to progress in that field. We therefore look for ML collaborations to progress toward Text2Speech MLM in less documented languages.

We would like to identify Higher education  which could have researchers or students willing to collaborate with Wikimedia on text2speech MLM in the open source spirit.

Afterwhat :

  • Wikimedia funding will be discussed this fall, likely in the range of modest, 5~20k€
  • Wikimedians can provide the global reach to minority communities and the data (audios via Lingualibre).
  • Higher education partner institutions could provide hard ML skills on the form of researchers or intern.

Also,

  • Could you share your email so I add you into my contacts list ?
  • Would you have some other contacts to advice us for ML and text2speech collaborations ?

Let me know if you want to be kept informed about out progresses.

Thank you again for the review of our field.

Best regards,

For the record :

Email 2 :

Hello Babeliona,

I noticed some error in my previous email, corrected those, and shared our email with WikiSpeech.

WikiSpeech, based in Stockholm, is the Wikimedia (Sweden) project prototyping text2Speech for wikimedians.

Lingua Libre (training data) and WikiSpeech (ML) are collaborating to identify avenues of opportunities (collaborations with researchers) for Wikimedian Text2Speech services.

Best regards.

Aim is to get to know the key Machine Learning actors and possibly Master degrees interns willing to collaborate with us, either via WikiSpeech fundings (WMSE), Lingua Libre funding (WMFR), Google Summer of Code 2025 sponsorship or else. Yug (talk) 18:04, 12 August 2024 (UTC)Reply

Does/will this match modern AI voice quality?

edit

Does or will audios created through this match the voice quality of these examples? Prototyperspective (talk) 21:57, 14 October 2024 (UTC)Reply

 
It takes around 30 seconds for article content fetching (this step is shown in the gif and can be longer depending on the article), around 3 minutes to generate the audio (assuming the settings don't vary), and around 1 minute to add the relevant categories, description and add it to the corresponding Wikidata item per audio (assuming many are done in a row). Prototyperspective (talk) 13:24, 5 November 2024 (UTC)Reply
Return to "Wikispeech" page.