Wikispeech/Background

About Wikispeech

The idea to develop Wikispeech dates several years back. In close dialogue with NGOs for visually impaired, we, Wikimedia Sverige (the Swedish Wikimedia Chapter), understood that we will never be able to reach our vision – a world where every single person on our planet is given free access to the sum of all human knowledge – if Wikipedia remains silent. If the sum of all human knowledge is to be accessible for all humans on our planet, the Wikimedia platforms will need to start to speak.

What have we done so far?

Up until now, we have developed a Text to Speech tool for MediaWiki, a tool that is only using free and open software. The tool is up and running (you can try it out here!), and currently supports Arabic, English and Swedish. As the goal is to make the tool a part of MediaWiki, readers won't need any extra equipment or particularly powerful devices to use it. The actual Text to Speech creation will be handled by the site, meaning that it is not limited by the device you are using. Currently existing Text to Speech solutions cannot achieve that today, meaning that the availability varies greatly according to where you live and what language you speak.

The tool has been developed as part of two projects. Wikispeech 2016 aimed to add Text to Speech to Wikipedia and other Wikimedia projects. Wikispeech 2019 aimed to create tools for contributing freely licensed speech data from volunteers, data which can then be used to improve the Text to Speech in Wikispeech and other speech technology projects, and in the end also create new voices.

What will we do next?

We are already done with the first step. There is a demo, and you can listen to it here. In the foreseeable future, this will become a beta function on the MediaWiki servers, making it possible to listen not only to the few given test pages, but generally in all Swedish, English and Arabic language Wikipedia articles.

Where do we go from here? The development in the short term will take place in a few steps:

We will collect feedback on the beta function from users, and improve the functionality according to the collected feedback.
We will improve the speech data collector, so that you can correct mistakes in the synthesis.
We will also develop material and ways of collecting speech data for improved voices and new languages.
We are part of a project to use machine learning and artificial intelligence to make the voices especially adapted to long texts, such as novels – and Wikipedia articles. Existing voices and speech syntheses are usually developed for shorter texts and instructions.

In the longer term, there are several interesting opportunities. Could Wikispeech be used to collect lexemes on Wikidata? Or oral citations on Wikipedia?. Perhaps it can be a tool for language revitalization, hand in hand with the fantastic work already done in almost 300 Wikipedia language versions? All of these are not only possibilities – but feasible ones. Where we go will depend on us as a community.

What is unique with Wikispeech?

Across the world, several Text to Speech solutions have been developed by commercial actors. Why do we need Wikispeech?

There are several answers to this question. To begin with, users and editors, in line with the Wikimedian ideology, will be able to contribute. Wikispeech will not be a one way solution. The community will, for example, be able to correct errors when encountered. That means, that with the help from users and editors, the technology will be able to better understand when to pronounce bass as »fish« and when to pronounce bass as »sound«; or when to pronounce row as »things in a more or less straight line« or when to pronounce row as »a fight or heated argument«. The community will also be able to contribute with speech data to improve the technology. Currently, speech syntheses are highly uniform. Together, we can build and improve voices in Indian English, French as spoken in Côte d'Ivoire or Swahili as spoken on Zanzibar. In the long run, you will be able to hear Wikipedia spoken in the dialect that you are used to. Or perhaps, a dialect that you are not used to, in order to improve your understanding. That also means, that we can simulate elderly male voices, or young female voices. In short, with the help from the community, the speech of Wikispeech can be made as diverse as its users – to the benefit of everyone listening.

Wikipedia also currently exists in 300 languages. Wikispeech is currently developed in three. But one especially important part of Wikispeech is that it is built so that it will be relatively easy to add more languages. Most commercial solutions have added the large European languages. But few of them can properly reflect the linguistic diversity of our world. Commercial companies will probably not develop speech syntheses for Northern Sami, Haitian Creole or perhaps Aboriginal languages of Australia. In short, we can prioritize according to other motives than most commercial companies, in order to serve the diversity of our communities.

Longterm aims and purposes

By 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge,
and anyone who shares our vision will be able to join us."

Thus reads the strategic direction of Wikimedia 2030. We believe that Wikispeech will be an important piece of the puzzle: an essential infrastructure of the ecosystem of free knowledge will need technology for other means of communication than written text.

We are just in the beginning of Wikispeech! We see a lot of potential, both in the short and the long term. The obvious, overarching aim is to build a Text to Speech tool that is functional in as many languages as possible, in which there is a Wikimedia project. The project will increase the accessibility of one of the most important websites. All other platforms using MediaWiki will be able to make use of the technical solutions which are developed during the project. That means several thousand websites, which quickly and easily will be able to activate text-to-speech.

With the open nature of the project, it will be possible to develop new ways of presenting spoken information. Perhaps you could listen to Wikipedia articles on your phone, while training at the gym? Create an audio book based on information on a certain topic from various Wikimedia platforms, as a way to present engaging information in audio form? With the software and data being open licensed, there are plenty of possibilities for creative people to bring this idea further.

We don't work in a vacuum. For us, Wikispeech might be the internal goal. For the rest of the world, we hope it will be an enabler.

Sustainable Development Goals

In 2015, almost all the countries in the world agreed on the seventeen Sustainable Development Goals. The goals are instrumental in the work to fulfill Agenda 2030, the agreed agenda from the United Nations on how to reach a sustainable world before the end of this decade.

Wikispeech will be important for many of the SDGs. Let us focus on three of them: Quality education, Industry, innovation and infrastructure, and Reduced inequalities. Wikispeech will make it possible for more people to gain knowledge from the Wikimedia content – if you recall, what we aim to be the sum of all human knowledge. It will radically increase the number of people who can access free knowledge and educate themselves. In that way, it will also contribute to reducing inequalities. And all this will be fulfilled through innovation in IT and internet infrastructure. On a general level, Wikispeech will be an important step towards a more sustainable world.

Accessibility

Across the world, many governments and international organizations strive towards increased digital accessibility. One example is the EU Web Accessibility Directive, which requires all public sector bodies within the EU to make its software accessible. Wikispeech would make it possible to use MediaWiki installations for external or internal communication, and still fulfill the Web Accessibility Directive. The software that the Wikimedia Movement has developed, built on open source and proven functional for two decades, could with Wikispeech become a tool not only for the movement in itself, but a central tool in the ecosystem of the internet. An essential infrastructure of the ecosystem of free knowledge.