User:MPopov (WMF)/Notes/Incubator test wikis

Motivation: Caroline's question posted in #working-with-data in Slack

Question: How many RTL languages have test wikis in the Incubator


As of June 19, 2024
Language directionality Languages with 1+ test wiki(s)
Cyrillic (LTR?) 1 (neg)
Vertical (Letters: TTB, Lines: LTR) 3
Left-to-right 576
Right-to-left 31

Caveat: These counts only include test wikis which satisfy the following criteria:

  • They are substantial (having at least 25 mainspace pages), and/or
  • They are active (having some active mainspace pages creation since the beginning of 2023).



The data will come entirely from which we will "scrape" using JavaScript and analyze separately.

When the page loads all the test wikis are collapsed/hidden. Each one can be expanded by clicking "[show]" link which will load the testwiki information. We can show/expand all of them by running the following JavaScript code in console which will trigger the click event on each [show] link:

$("td a.att-toggle:contains('[show]')").each(function(index) {

It will take a minute or two to load all of the testwikis' information. Now that they have all loaded, we can extract the ISO 639-3 language code and directionality info for each testwiki, storing those two pieces of data in the array testwiki_languages:

var testwiki_languages = [];

$(".testwiki-language").each(function(index) {
  var lang = {
      iso_639_3: $(this).find("kbd a").text(),
      directionality: $(this).find("ul li:contains('Directionality')").text().replace('Directionality: ', '')

To get that data into R or Python, we need to stringify it into a JSON representation:


We can copy the output by right-clicking in the console and selecting Copy Message. For data analysis in Python you would use:

import pandas as pd

testwiki_languages = pd.read_json('[{"iso_639_3":…]')

But we are going to do the analysis with R:


testwiki_languages <- fromJSON('[{"iso_639_3":…]')

(In both of these cases the full and rather lengthy string is omitted.) Finally, let's count languages by directionality – keeping in mind that due to how we compiled our dataset it will have duplicates of languages if there are multiple projects incubating for a language (e.g. Moroccan Arabic Wikibooks, Moroccan Arabic Wikiquote, Moroccan Arabic Wiktionary):


testwiki_languages |>
  distinct(iso_639_3, directionality) |>