Research:Develop a model for text simplification to improve readability of Wikipedia articles/First round of experiments

We develop a model for text simplification by fine-tuning an existing pre-trained language model for our specific task at hand.

As a backbone model we use the Flan-T5 model[1]. The model is an enhanced version of T5 that has been finetuned in a mixture of tasks. The documentation mentions that it supports 60 languages, so it can, in principle, also be applied beyond English in a multilingual setting. The model has also been shown to perform exceptionally well in summarization tasks[2]: “Our experimental results show that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is a fine-tuned FLAN-T5-Large, which achieves performance on par with much larger LLMs (from 7B to more than 70B) used in zero-shot settings, while being significantly smaller. This makes smaller LLMs like FLAN-T5 a suitable cost-efficient LLM for real-world deployment.

We then train and evaluate the model on pairs of Wikipedia articles (original version, simplified version) as a sequence-to-sequence model, where the input is the original version and the output is the simplified version.

We quantitatively evaluate the models using 3 common metrics (in each case higher is better)

  • SARI: SARI is a metric used for evaluating automatic text simplification systems. The metric compares the predicted simplified sentences against the reference and the source sentences. It explicitly measures the goodness of words that are added, deleted and kept by the system.
  • D-SARI: an adapted version of SARI which is claimed to be more suitable for the task of document-level simplification.
  • BLEU: is a metric borrowed from machine translation judging the quality of a text that has been translated (bilingual evaluation understudy). Similar to SARI, it compares the output against a reference and a source. Similar to SARI, it compares the output against a reference and a source. It is commonly used as a complementary metric to SARI as it has been shown to correlate with judgements of grammaticality and meaning preservation of the simplified text[3].In addition, we provide examples of simplification for qualitative evaluation.

Experiments in English

edit

As a first step, we test the simplification on different datasets in English.

D-Wikipedia benchmark

edit

We first use the D-Wikipedia[4] benchmark dataset for training and evaluating our model. This dataset has been used in previous works developing simplification models. It thus gives us a reference point for how well our model is doing in comparison. The dataset contains 132,546/3,000/8,000 pairs of articles in the training/validation/test set.  

Quantitative evluation of simplification on the D-Wikipedia dataset
Model SARI D-SARI BLEU
BART (Sun et al.) 48.34 37.24 31.77
flan-t5-base 46.9 32.3 28.5

Examples:

# Notes Source Text
3228 Parsing issues: Missing text from parsing wikitext Original the arrondissement of nogent-sur-seine is an arrondissement of france in the aube department in the grand est region . it has 79 communes . its population is 54,067 ( 2016 ) , and its area is .
Simplified

(Model)

the arrondissement of nogent-sur-seine is an arrondissement of france in the aube department in the grand est region. it has 79 communes. its population is 54,067 ( 2016 ) and its area is.
Simplified

(Data)

the arrondissement of nogent-sur-seine is an arrondissement of france . it is in the aube department , grand est region . its capital is the city of nogent-sur-seine
6986 Filtering issues: articles contain only a single sentence Original the british rail class 83 electric locomotives were built by english electric at vulcan foundry , newton-le-willows as part of the west coast main line electrification .
Simplified

(Model)

the british rail class 83 electric locomotives were built by english electric at vulcan foundry, newton-le-willows as part of the west coast main line electrification.
Simplified

(Data)

the british rail class 83 electric locomotives were built by english electric at vulcan foundry , newton-le-willows as part of the west coast main line electrification .
1610 Filtering issues: Samples contain disambiguation pages Original kaffir or kafir may refer to :
Simplified

(Model)

kaffir or kafir refers to :   
Simplified

(Data)

kaffir may be meaning :

Observations:

  • Our fine-tuned model compatible with previously reported best models
  • Output not well formatted and is not suited for use in practice.
    • Text is lowercase.
    • Text is missing. Most likely due to use of wikitext (see a similar issue reported here). Solution is to parse text from HTML.

Custom Simple-English-Wikipedia Dataset

edit

We use a dataset of pairs of articles from English Wikipedia and Simple Wikipedia (SEW) from our work on automatically scoring readability of articles in Wikipedia. The articles from enwiki and simplewiki were matched via their Wikidata ID. We extract the plain text of the lead section from the HTML version of the article. We removed articles that are list- or disambiguation pages. We only kept pairs of articles that each have at least 3 sentences.

The dataset contains 89,873/11,234/11,235 pairs of articles in the training/validation/test set.

Quantitative evaluation of simplification on the custom SEW dataset
Model SARI D-SARI BLEU
flan-t5-small 45.0 28.0 26.5
flan-t5-base tba tba tba
flan-t5-large 46.9 33.0 27.0
flan-t5-xl 47.0 32.2 27.0

Example output

# Notes Source Text
18887 "Reasonable" simplification Original Stadium Australia (currently known as Accor Stadium for sponsorship purposes) is a multi-purpose stadium located in the Sydney Olympic Park, in Sydney, Australia. The stadium, which in Australia is sometimes referred to as Sydney Olympic Stadium, Homebush Stadium or simply the Olympic Stadium, was completed in March 1999 at a cost of A$690 million to host the 2000 Summer Olympics. The Stadium was leased by a private company, the Stadium Australia Group, until the Stadium was sold back to the NSW Government on 1 June 2016 after NSW Premier Michael Baird announced the Stadium was to be redeveloped as a world-class rectangular stadium. The Stadium is owned by Venues NSW on behalf of the NSW Government. The stadium was originally built to hold circa 115,000 spectators, making it the largest Olympic Stadium ever built and the second largest stadium in Australia after the Melbourne Cricket Ground which held more than 120,000 before its re-design in the early 2000s. In 2003, reconfiguration work was completed to shorten the north and south wings, and install movable seating. These changes reduced the capacity to 80,000, with the capacity to add seating depending on the venue configuration. Awnings were also added over the north and south stands, allowing most of the seating to be under cover. The stadium was engineered along sustainable lines, e.g., utilising less steel in the roof structure than the Olympic stadiums of Athens and Beijing.
Simplified

(Model)

Stadium Australia is a multi-purpose stadium in Sydney, Australia. It was built to host the 2000 Summer Olympics. The stadium was completed in March 1999 at a cost of A$690 million to host the 2000 Summer Olympics. The Stadium was leased by a private company, the Stadium Australia Group, until the Stadium was sold back to the NSW Government on 1 June 2016. The Stadium is owned by Venues NSW on behalf of the NSW Government.
Simplified

(Data)

Stadium Australia, known for sponsorship reasons as ANZ Stadium, formerly Telstra Stadium, is a sports stadium in Sydney, Australia. The 2000 Summer Olympics took place there. Now it is used for many other sports, like rugby and AFL. The stadium opened in March 1999. In 2002, the stadium was named after the sponsor Telstra. 1 January 2008 it was renamed the ANZ Stadium. As "Stadium Australia", the name was sometimes confused with the Sydney Football Stadium, which was formerly known as Aussie Stadium.
68361 Issue: Too extreme summarization (simplification much shorter) Original Paraná (Brazilian Portuguese: [paɾɐˈna] (listen)) is one of the 26 states of Brazil, in the south of the country. It is bordered in the north by São Paulo state, in the east by the Atlantic Ocean, in the south by Santa Catarina state and the province of Misiones, Argentina, and in the west by Mato Grosso do Sul and Paraguay, with the Paraná River as its western boundary. It is subdivided into 399 municipalities, and its capital is the city of Curitiba. Other major cities are Londrina, Maringá, Ponta Grossa, Cascavel, São José dos Pinhais and Foz do Iguaçu. The state is home to 5.4% of the Brazilian population and generates 6.2% of the Brazilian GDP. Crossed by the Tropic of Capricorn, Paraná has what is left of the araucaria forest, one of the most important subtropical forests in the world. At the border with Argentina is the National Park of Iguaçu, considered by UNESCO as a World Heritage Site. 40 km (25 mi) from there, at the border with Paraguay, the largest dam in the world was built, the Hidroelétrica de Itaipu (Itaipu Hydroelectric Dam). The crime rate is considered low by Brazilian standards and the state is one of the most developed ones in the nation, ranking 4th in gross domestic product, only behind the states of Rio de Janeiro, São Paulo and Minas Gerais.
Simplified

(Model)

Paraná is one of the 26 states of Brazil. It is in the south of the country. Its capital is the city of Curitiba. Other major cities are Londrina, Maringá, Ponta Grossa, Cascavel, So José dos Pinhais and Foz do Iguaçu.
Simplified

(Data)

Paraná is a state in Southern Brazil. It is next to Paraguay and Argentina. Its capital is Curitiba. Paraná is a famous state because of Itaipu Dam and the Iguaçu Falls. Paraná is bounded on the north by São Paulo, on the east by the Atlantic Ocean, on the south by Santa Catarina and the Misiones Province of Argentina, and on the west by Mato Grosso do Sul and the republic of Paraguay, with the Paraná River as its western boundary line. Paraná is one of the Brazilian states that shows the most signs of recent European colonization. Most of the population are descendants of European settlers, and is one of the most ethnically diverse. Waves of European immigrants started arriving after 1850, mainly Germans, Italians, Poles and Ukrainians. The development of the state is closely linked to the arrival of the immigrants. Nowadays, migrants from other Brazilian states (especially from the São Paulo, Santa Catarina, Rio Grande do Sul and the Northeast Region) come to Paraná in search of a better life. People of Polish descent predominate in the central and south regions, mainly around Curitiba. Italians predominate in the capital, Curitiba, and in the coast. People of Ukrainian descent predominate in a few cities, such as Prudentópolis; many Ukrainians are found throughout the state. Germans predominate in a few cities, such as Rolândia. Almost all towns have many Portuguese people and their descendants. People of mixed-race ancestry predominate in a few towns in the north. There is an important Muslim Lebanese and Syrian community around Foz do Iguaçu. In the north, there are many Japanese people.
49778 Issue: Repetition Original A well is an excavation or structure created in the ground by digging, driving, or drilling to access liquid resources, usually water. The oldest and most common kind of well is a water well, to access groundwater in underground aquifers. The well water is drawn up by a pump, or using containers, such as buckets or large water bags that are raised mechanically or by hand. Water can also be injected back into the aquifer through the well. Wells were first constructed at least eight thousand years ago and historically vary in construction from a simple scoop in the sediment of a dry watercourse to the qanats of Iran, and the stepwells and sakiehs of India. Placing a lining in the well shaft helps create stability, and linings of wood or wickerwork date back at least as far as the Iron Age. Wells have traditionally been sunk by hand digging, as is still the case in rural areas of the developing world. These wells are inexpensive and low-tech as they use mostly manual labour, and the structure can be lined with brick or stone as the excavation proceeds. A more modern method called caissoning uses pre-cast reinforced concrete well rings that are lowered into the hole. Driven wells can be created in unconsolidated material with a well hole structure, which consists of a hardened drive point and a screen of perforated pipe, after which a pump is installed to collect the water. Deeper wells can be excavated by hand drilling methods or machine drilling, using a bit in a borehole. Drilled wells are usually cased with a factory-made pipe composed of steel or plastic. Drilled wells can access water at much greater depths than dug wells. Two broad classes of well are shallow or unconfined wells completed within the uppermost saturated aquifer at that location, and deep or confined wells, sunk through an impermeable stratum into an aquifer beneath. A collector well can be constructed adjacent to a freshwater lake or stream with water percolating through the intervening material. The site of a well can be selected by a hydrogeologist, or groundwater surveyor. Water may be pumped or hand drawn. Impurities from the surface can easily reach shallow sources and contamination of the supply by pathogens or chemical contaminants needs to be avoided. Well water typically contains more minerals in solution than surface water and may require treatment before being potable. Soil salination can occur as the water table falls and the surrounding soil begins to dry out. Another environmental problem is the potential for methane to seep into the water.
Simplified

(Model)

A well is a hole in the ground to get water. People dig wells to get water from underground. Wells are usually deep enough to get water from a lake or river. Wells can be dug by hand or by a machine. Wells can be dug to get water from a lake or river. Wells can also be dug to get water from a place where there is no water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is not much water. Wells can be dug to get water from a place where there is a lot of water. Wells can be dug to get water from a place where there is a lot of water.
Simplified

(Data)

A well is a hole that is dug into the Earth to get a fluid. Most wells are to get water. There are also oil well s and gas wells. A qanat is an ancient complex water well system used in the Middle East. Wells can be as simple as a hole that a bucket on a rope can be lowered into, or very complex with pipes and high-powered pumps to get the water out. Most cities that are not close to fresh water lakes or rivers get their water from wells. It is important to be careful what rubbish is put into the ground near a well. If something toxic is put in the ground, it could go in the groundwater and the well and make people sick. Water is a problem for many African countries. Many charities are helping to build wells in local villages need not go far to get water. A well only works if underneath there is an aquifer which feeds it.

Observations:

  • In many cases, the model performs very little simplification. The output is almost identical to the original
  • Extreme summarization. The model simplifies but also reduces the text to only 1-2 sentences. This is probably undesirable as it removes too much information from the text instead of simplifying
  • Repetition. The model’s output repeats sentences. This is an error.

Most likely, these results are an artifact of the training data. In many cases, the version from Simple Wikipedia is either i) very short, or ii) is not much simpler (e.g. when measuring the readability using FKGL) than the original. After fine-tuning the model on this data, the model will reproduce this pattern.

Notably, we also tested the larger Flan-T5-XL model (3B parameters). We trained the model on external GPUs (T355440) as the model is too large to be trained in our internal infrastructure (as of now when this is written). However, we can still use the fine-tuned model for running inference with our infrastructure since that requires significantly fewer resources than training. This test successfully demonstrated that we can, in principle, use larger models by, first, training them with external GPUs and, second, copying the model weights to our infrastructure and running for inference. Though in this specific case, the much larger model size does not seem to yield substantial improvements in model performance.

Filtered Simple-English-Wikipedia Dataset

edit

One potential solution to the issues observed above, is to create a higher-quality dataset containing examples of simplifications that better capture the intended usage.

For this, we filter the training data to only contain pairs of original/simplified versions where:

  • the version from Simple Wikipedia is substantially simpler than the version from English Wikipedia. We only keep samples where the FKGL improved by 5 or more points
  • the version from Simple Wikipedia is not too short compared to the version from English Wikipedia. We only keep samples where the length (in characters) is reduced by at most 66%.

This reduces the number of pairs in the data to 12,273 (training) and 1,469 (test data). This shows that from the original training data only a small fraction of pairs actually constitute “good” examples of simplification.

Model SARI D-SARI BLEU
flan-t5-large 45.7 36.5 20.8


Some examples (that were also considered above)

# Notes Source Text
18887 FKGL: 12 (original) --> 8 (simplified) Original Stadium Australia (currently known as Accor Stadium for sponsorship purposes) is a multi-purpose stadium located in the Sydney Olympic Park, in Sydney, Australia. The stadium, which in Australia is sometimes referred to as Sydney Olympic Stadium, Homebush Stadium or simply the Olympic Stadium, was completed in March 1999 at a cost of A$690 million to host the 2000 Summer Olympics. The Stadium was leased by a private company, the Stadium Australia Group, until the Stadium was sold back to the NSW Government on 1 June 2016 after NSW Premier Michael Baird announced the Stadium was to be redeveloped as a world-class rectangular stadium. The Stadium is owned by Venues NSW on behalf of the NSW Government. The stadium was originally built to hold circa 115,000 spectators, making it the largest Olympic Stadium ever built and the second largest stadium in Australia after the Melbourne Cricket Ground which held more than 120,000 before its re-design in the early 2000s. In 2003, reconfiguration work was completed to shorten the north and south wings, and install movable seating. These changes reduced the capacity to 80,000, with the capacity to add seating depending on the venue configuration. Awnings were also added over the north and south stands, allowing most of the seating to be under cover. The stadium was engineered along sustainable lines, e.g., utilising less steel in the roof structure than the Olympic stadiums of Athens and Beijing.
Simplified

(Model)

The Sydney Olympic Stadium is a multi-purpose stadium in Sydney, Australia. It was built in 1999 to host the 2000 Summer Olympics. The stadium was leased by a private company, the Stadium Australia Group, until it was sold back to the NSW Government on 1 June 2016. The stadium was originally built to hold around 115,000 spectators. It was the largest Olympic Stadium ever built and the second largest stadium in Australia after the Melbourne Cricket Ground which held more than 120,000 before its re-design in the early 2000s. In 2003, reconfiguration work was completed to shorten the north and south wings, and install movable seating. These changes reduced the capacity to 80,000, with the capacity to add seating depending on the venue configuration. Awnings were also changed.
Simplified

(Data)

Stadium Australia, known for sponsorship reasons as ANZ Stadium, formerly Telstra Stadium, is a sports stadium in Sydney, Australia. The 2000 Summer Olympics took place there. Now it is used for many other sports, like rugby and AFL. The stadium opened in March 1999. In 2002, the stadium was named after the sponsor Telstra. 1 January 2008 it was renamed the ANZ Stadium. As "Stadium Australia", the name was sometimes confused with the Sydney Football Stadium, which was formerly known as Aussie Stadium.
68361 FKGL: 9 (original) --> 7 (simplified), i.e. Very little simplification Original Paraná (Brazilian Portuguese: [paɾɐˈna] (listen)) is one of the 26 states of Brazil, in the south of the country. It is bordered in the north by São Paulo state, in the east by the Atlantic Ocean, in the south by Santa Catarina state and the province of Misiones, Argentina, and in the west by Mato Grosso do Sul and Paraguay, with the Paraná River as its western boundary. It is subdivided into 399 municipalities, and its capital is the city of Curitiba. Other major cities are Londrina, Maringá, Ponta Grossa, Cascavel, São José dos Pinhais and Foz do Iguaçu. The state is home to 5.4% of the Brazilian population and generates 6.2% of the Brazilian GDP. Crossed by the Tropic of Capricorn, Paraná has what is left of the araucaria forest, one of the most important subtropical forests in the world. At the border with Argentina is the National Park of Iguaçu, considered by UNESCO as a World Heritage Site. 40 km (25 mi) from there, at the border with Paraguay, the largest dam in the world was built, the Hidroelétrica de Itaipu (Itaipu Hydroelectric Dam). The crime rate is considered low by Brazilian standards and the state is one of the most developed ones in the nation, ranking 4th in gross domestic product, only behind the states of Rio de Janeiro, São Paulo and Minas Gerais.
Simplified

(Model)

Paraná (Brazilian Portuguese: [pana] (listen)) is one of the 26 states of Brazil. It is in the south of the country. It is bordered in the north by So Paulo state, in the east by the Atlantic Ocean, in the south by Santa Catarina state and the province of Misiones, Argentina, and in the west by Mato Grosso do Sul and Paraguay, with the Paraná River as its western boundary. It is subdivided into 399 municipalities, and its capital is the city of Curitiba. Other major cities are Londrina, Maringá, Ponta Grossa, Cascavel, So José dos Pinhais and Foz do Iguaçu. The state is home to 5.4% of the Brazilian population and generates 6.2% of the Brazilian GDP.
Simplified

(Data)

Paraná is a state in Southern Brazil. It is next to Paraguay and Argentina. Its capital is Curitiba. Paraná is a famous state because of Itaipu Dam and the Iguaçu Falls. Paraná is bounded on the north by São Paulo, on the east by the Atlantic Ocean, on the south by Santa Catarina and the Misiones Province of Argentina, and on the west by Mato Grosso do Sul and the republic of Paraguay, with the Paraná River as its western boundary line. Paraná is one of the Brazilian states that shows the most signs of recent European colonization. Most of the population are descendants of European settlers, and is one of the most ethnically diverse. Waves of European immigrants started arriving after 1850, mainly Germans, Italians, Poles and Ukrainians. The development of the state is closely linked to the arrival of the immigrants. Nowadays, migrants from other Brazilian states (especially from the São Paulo, Santa Catarina, Rio Grande do Sul and the Northeast Region) come to Paraná in search of a better life. People of Polish descent predominate in the central and south regions, mainly around Curitiba. Italians predominate in the capital, Curitiba, and in the coast. People of Ukrainian descent predominate in a few cities, such as Prudentópolis; many Ukrainians are found throughout the state. Germans predominate in a few cities, such as Rolândia. Almost all towns have many Portuguese people and their descendants. People of mixed-race ancestry predominate in a few towns in the north. There is an important Muslim Lebanese and Syrian community around Foz do Iguaçu. In the north, there are many Japanese people.
49778 FKGL: 10 (original) --> 5 (simplified) Original A well is an excavation or structure created in the ground by digging, driving, or drilling to access liquid resources, usually water. The oldest and most common kind of well is a water well, to access groundwater in underground aquifers. The well water is drawn up by a pump, or using containers, such as buckets or large water bags that are raised mechanically or by hand. Water can also be injected back into the aquifer through the well. Wells were first constructed at least eight thousand years ago and historically vary in construction from a simple scoop in the sediment of a dry watercourse to the qanats of Iran, and the stepwells and sakiehs of India. Placing a lining in the well shaft helps create stability, and linings of wood or wickerwork date back at least as far as the Iron Age. Wells have traditionally been sunk by hand digging, as is still the case in rural areas of the developing world. These wells are inexpensive and low-tech as they use mostly manual labour, and the structure can be lined with brick or stone as the excavation proceeds. A more modern method called caissoning uses pre-cast reinforced concrete well rings that are lowered into the hole. Driven wells can be created in unconsolidated material with a well hole structure, which consists of a hardened drive point and a screen of perforated pipe, after which a pump is installed to collect the water. Deeper wells can be excavated by hand drilling methods or machine drilling, using a bit in a borehole. Drilled wells are usually cased with a factory-made pipe composed of steel or plastic. Drilled wells can access water at much greater depths than dug wells. Two broad classes of well are shallow or unconfined wells completed within the uppermost saturated aquifer at that location, and deep or confined wells, sunk through an impermeable stratum into an aquifer beneath. A collector well can be constructed adjacent to a freshwater lake or stream with water percolating through the intervening material. The site of a well can be selected by a hydrogeologist, or groundwater surveyor. Water may be pumped or hand drawn. Impurities from the surface can easily reach shallow sources and contamination of the supply by pathogens or chemical contaminants needs to be avoided. Well water typically contains more minerals in solution than surface water and may require treatment before being potable. Soil salination can occur as the water table falls and the surrounding soil begins to dry out. Another environmental problem is the potential for methane to seep into the water.
Simplified

(Model)

A well is a hole in the ground. It is used to get water from the ground. The most common kind of well is a water well, to get water from underground aquifers. The well water is drawn up by a pump, or using containers, such as buckets or large water bags that are raised mechanically or by hand. Water can also be injected back into the aquifer through the well. Wells were first constructed at least eight thousand years ago. They have been built in many different ways. They have been built in the qanats of Iran, and the stepwells and sakiehs of India. Wells have traditionally been sunk by hand digging, as is still the case in rural areas of the developing world. These wells are cheap and low-tech as they use mostly manual labour. The structure can be lined with brick or stone as the excavation progresses. A more modern method called cavern is used to dig wells.
Simplified

(Data)

A well is a hole that is dug into the Earth to get a fluid. Most wells are to get water. There are also oil well s and gas wells. A qanat is an ancient complex water well system used in the Middle East. Wells can be as simple as a hole that a bucket on a rope can be lowered into, or very complex with pipes and high-powered pumps to get the water out. Most cities that are not close to fresh water lakes or rivers get their water from wells. It is important to be careful what rubbish is put into the ground near a well. If something toxic is put in the ground, it could go in the groundwater and the well and make people sick. Water is a problem for many African countries. Many charities are helping to build wells in local villages need not go far to get water. A well only works if underneath there is an aquifer which feeds it.

You can find the simplification results of the model on all articles in the test data in the repository: https://gitlab.wikimedia.org/repos/research/text-simplification/-/blob/main/df_evaluation_t5-flan-large_sew-filtered.csv

Observations:

  • From the quantitative measures, we see a substantial improvement in the D-SARI score (though the other measures remain largely unchanged in comparison to the other experiments above). This suggests that the filtering probably does have a positive effect when considering simplification on the document-level.
  • Qualitative investigation of the results supports this. The model generates output that seems to capture substantial simplifications. At the same time, the output seems to be more stable, i.e. suffering from fewer errors (repetition or extreme simplification).

Multilingual experiments

edit

We test how well the Flan-T5 model performs in simplification for languages beyond English. The model card mentions 61 supported languages.

We use WikiReaD (Wikipedia Readability Dataset) which contain spairs of encyclopedic articles in 14 languages. Each pair includes the same article in two levels of readability (easy/hard). The pairs are obtained by matching Wikipedia articles (hard) with the corresponding versions from different simplified or children's encyclopedias (easy).

We evaluate three different model flavors of the Flan-T5-large model:

  • plain (no fine-tuning): no additional fine-tuning of the model. The input to the model is simply "Simplify in <LANGUAGE>: <ORIGINAL TEXT>"
  • English fine-tuning: the model is only fine-tuned on the English dataset (see above "Filtered Simple-English-Wikipedia Dataset")
  • Multilingual fine-tuning: the model is fine-tuned on the English dataset (same as above) and, additionally, with 100 randomly chosen samples from each of the non-English datasets.

Qualitative

edit

We provide a randomly chosen example from each dataset and each model output

Dataset Original Plain (no fine-tuning) English fine-tuning Multilingual fine-tuning
simplewiki-en "Spirit in the Night" is a song written and originally recorded by American singer/songwriter Bruce Springsteen for his debut album Greetings from Asbury Park, N.J. (1973). It was also the second single released from the album. A cover version performed by Manfred Mann's Earth Band, was released on the album Nightingales and Bombers and as a Top 40 single. Spirit in the Night is a song written and originally recorded by American singer/songwriter Bruce Springsteen for his debut album Greetings from Asbury Park, N.J. (1973). It was also the second single released from the album. A cover version performed by Manfred Mann's Earth Band, was released on the album Nightingales and Bombers and as a Top 40 single. "Spirit in the Night" is a 1973 song by Bruce Springsteen. It was on his debut album Greetings from Asbury Park, N.J. It was also the second single from the album. A cover version was performed by Manfred Mann's Earth Band. It was released on the album Nightingales and Bombers and as a Top 40 single. "Spirit in the Night" is a 1973 song by Bruce Springsteen. It was on his debut album Greetings from Asbury Park, N.J. It was also the second single from the album. A cover version by Manfred Mann's Earth Band was released on the album Nightingales and Bombers. It was a Top 40 single.
vikidia-en Lake Superior is the largest freshwater lake in the world by surface area and the third-largest by volume, holding 10% of the world's surface fresh water. Located in central North America, it is the northernmost and westernmost of the Great Lakes of North America, straddling the Canada–United States border with the Canadian province of Ontario to the north and east and the U.S. states of Minnesota to the west and Wisconsin and Michigan to the south. It drains into Lake Huron via St. Marys River, then through the lower Great Lakes to the St. Lawrence River and ultimately the Atlantic Ocean. Lake Superior is the largest freshwater lake in the world by surface area and the third-largest by volume, holding 10% of the world's surface fresh water. Located in central North America, it is the northernmost and westernmost of the Great Lakes of North America, straddling the Canada–United States border with the Canadian province of Ontario to the north and east and the U.S. states of Minnesota to the west and Wisconsin and Michigan to the south. It drains into Lake Huron via St. Marys River, then through the lower Great Lakes to the St. Lawrence River and ultimately the Atlantic Ocean. Lake Superior is the largest freshwater lake in the world by surface area. It is the third-largest by volume. It holds 10% of the world's surface fresh water. It is in central North America. It is the northernmost and westernmost of the Great Lakes of North America, straddling the Canada–United States border with the Canadian province of Ontario to the north and east and the U.S. states of Minnesota to the west and Wisconsin and Michigan to the south. It drains into Lake Huron via St. Marys River, then through the lower Great Lakes to the St. Lawrence River and ultimately the Atlantic Ocean. Lake Superior is the largest freshwater lake in the world. It is the northernmost and westernmost of the Great Lakes of North America. It is straddling the Canada–United States border with the Canadian province of Ontario to the north and east and the U.S. states of Minnesota to the west and Wisconsin and Michigan to the south. It drains into Lake Huron via St. Marys River, then through the lower Great Lakes to the St. Lawrence River and ultimately the Atlantic Ocean.
vikidia-ca El Piemont (Piemont en piemontès i occità, Piemonte en italià) és una de les 20 regions d'Itàlia. Es divideix administrativament en 7 províncies: Alessandria, Asti, Biella, Cuneo, Novara, Verbano-Cusio-Ossola, i la ciutat metropolitana de Torí. La capital és Torí. El Piemont (Piemont en piemontès i occità, Piemonte en italià) és una de les 20 regions d'Itàlia. Es divideix administrativament en 7 provncies: Alessandria, Asti, Biella, Cuneo, Novara, Verbano-Cusio-Ossola, i la ciutat metropolitana de Tor. La capital és Tor. Piemont (Piemontès i occità, Piemonte en italià) è una de les 20 regions d'Itàlia. It is divided administratively in 7 provinces: Alessandria, Asti, Biella, Cuneo, Novara, Verbano-Cusio-Ossola, e la ciutat metropolitana de Tor. The capital is Tor. Piemont és una de les 20 regiones d'Itàlia. La capital és Tor.
vikidia-de Die Beaufortskala (Bft) ist eine Skala zur Einteilung der Windstärke in 13 Stärkenbereiche von 0 (Windstille) bis 12 (Orkan). Sie basiert ursprünglich auf den beobachteten Auswirkungen des Windes, wird in der praktischen Anwendung heute allerdings oft aus der gemessenen Windgeschwindigkeit abgeleitet. Sie ist benannt nach Sir Francis Beaufort und ein weit verbreitetes System zur Beschreibung der Windstärke. Die Beaufortskala (Bft) ist eine Skala zur Einteilung der Windstärke in 13 Stärkenbereiche von 0 (Windstille) bis 12 (Orkan). Sie basiert ursprünglich auf den beobachteten Auswirkungen des Windes, wird in der praktischen Anwendung heute allerdings oft aus dem gemessenen Windgeschwindigkeit abgeleitet. Sie ist benannt nach Sir Francis Beaufort und ein weit verbreitetes System zur Beschreibung der Windstärke. The Beaufort Scale is a scale used to measure wind strength. It is based on the effects of the wind. It is named after Sir Francis Beaufort. Die Beaufortskala ist eine Skala, die die Windstärke in 13 Stärkenbereichen von 0 (Windstille) bis 12 (Orkan) nennt. Sie wurde ursprünglich von Sir Francis Beaufort benannt. Sie ist ein weit verbreitetes System zur Beschreibung der Windstärke.
vikidia-el Η Ιταλία (ιταλικά: Italia), επίσημα: Ιταλική Δημοκρατία (Repubblica Italiana), είναι κυρίαρχο κράτος στην Ευρώπη. Έχει συνολική έκταση 301.340 τ.χλμ. και πληθυσμό 59.030.133 κατοίκους (σύμφωνα με επίσημες εκτιμήσεις για το 2022). Πρωτεύουσα και μεγαλύτερη πόλη της Ιταλίας, είναι η Ρώμη. Αποτελείται από μία χερσόνησο σε σχήμα μπότας και δύο μεγάλα νησιά στη Μεσόγειο θάλασσα: τη Σικελία και τη Σαρδηνία. Βόρεια συνορεύει με την Ελβετία και την Αυστρία, δυτικά με τη Γαλλία και ανατολικά με τη Σλοβενία, ενώ εξκλάβιο της Ιταλίας αποτελεί και η πόλη Καμπιόνε ντ'Ιτάλια, που βρίσκεται στο έδαφος της Ελβετίας. Οι ανεξάρτητες χώρες του Αγίου Μαρίνου και του Βατικανού βρίσκονται εξ ολοκλήρου μέσα σε ιταλικό έδαφος. -   (: Italia), :   (Repubblica Italiana),    .    301.340..  59.030.133  (      2022).          .               :    .            .   (: Italia), :   (Repubblica Italiana),     .    301.340..   59.030.133  (      2022).       ,  .             :     .                                                        
vikidia-es The Boeing Company es una empresa multinacional estadounidense​ que diseña, fabrica y vende aviones, helicópteros, misiles y satélites y proporciona asesoramiento y servicio técnico. Boeing es el segundo mayor fabricante de aeronaves comerciales del mundo detrás de Airbus SAS.​​​ El segundo contratista de defensa del mundo, según datos de 2013,​ y el mayor exportador de Estados Unidos en función del valor del dólar.​​ Sus acciones se encuentran en el Promedio Industrial Dow Jones. La sede central se encuentra en Chicago. La Boeing Company es una empresa multinacional estadounidense que disea, fabrica y vende aviones, helicópteros, misiles y satélites y proporciona asesoramiento y servicio técnico. Boeing es el segundo mayor fabricante de aeronaves comerciales del mundo detrás de Airbus SAS. El segundo contratista de defensa del mundo, segn datos de 2013, y el mayor exportador de Estados Unidos en función del valor del dólar. Sus acciones se encuentran en el Promedio Industrial Dow Jones. La sede central se encuentra en Chicago. The Boeing Company is an American aerospace company. It designs, builds and sells airplanes, helicopters, missiles and satellites. It also provides technical advice and service. Boeing is the second largest manufacturer of commercial aircraft in the world, behind Airbus SAS. It is the largest exporter of the United States in terms of the value of the dollar. Boeing's headquarters are in Chicago. Boeing es una empresa de aeronaves y aviones del pas. Su empresa tiene una mayora de las aviones del mundo. Sus principales productos son aviones, helicópteros, misiles y satélites. Su empresa tiene una mayora de los exportadores del pas.
vikidia-eu Esfinter zuntz biribileko giharra da, zulo baten inguruan eraztun gisa ezarrita dagoena eta zuloaren neurria (itxi eta ireki) aldarazten duena. Esfinter asko dago izaki bizien gorputzean. Horien artean ezagunak dira uzkian dagoena eta maskuriaren lepoan dagoena; azken horrek gernuaren irteera erregulatzen du. Esfinter zuntz biribileko giharra da, zulo baten inguruan eraztun gisa ezarrita dagoena eta zuloaren neurria (itxi eta ireki) aldarazten duena. Esfinter asko dago izaki bizien gorputzean. Esfinter zuntz biribileko giharra da, zulo baten inguruan eraztun gisa ezarrita dagoena eta zuloaren neurria (itxi eta ireki) aldarazten duena. Esfinter asko dago izaki bizien gorputzean. Horien artean ezagunak dira uzkian dagoena eta maskuriaren lepoan dagoena; azken horrek gernuaren irteera erregulatzen du. Esfinter zuntz biribileko giharra da, zulo baten inguruan eraztun gisa ezarrita dagoena eta zuloaren neurria (itxi eta ireki) aldarazten duena. Esfinter asko dago izaki bizien gorputzean. Horien artean ezagunak dira uzkian dagoena eta maskuriaren lepoan dagoena; azken horrek gernuaren irteera erregulatzen du.
vikidia-fr Les lymphocytes sont des cellules qui ont un rôle majeur dans le système immunitaire. En matière de structure et de fonction, on distingue trois lignées lymphocytaires différentes : les lymphocytes B et T, de petite taille, et les lymphocytes NK, de grande taille et granuleux. Les lymphocytes sont des leucocytes (globules blancs) de petite taille se trouvant principalement dans les ganglions lymphatiques et la rate. Les lymphocytes sont des cellules qui ont un rôle majeur dans le système immunitaire. En matière de structure et de fonction, on distingue trois lignées lymphocytaires différentes : les lymphocytes B et T, de petite taille, et les lymphocytes NK, de grande taille et granuleux. Les lymphocytes sont des leucocytes (globules blancs) de petite taille se trouvant principalement dans les ganglions lymphatiques et la rate. Les lymphocytes are cells that have a major role in the immune system. There are three different types of lymphocytes: lymphocytes B and T, which are small, and lymphocytes NK, which are large and granulous. Lymphocytes are white globules (globules white) that are found mostly in the ganglions lymphatiques and the rate. Les lymphocytes sont des cellules qui ont un rôle majeur dans le système immunitaire. Les lymphocytes B et T sont des cellules de petite taille, et les lymphocytes NK sont des cellules de grande taille et granuleux. Les lymphocytes sont des leucocytes (globules blancs) de petite taille se trouvant principalement dans les ganglions lymphatiques et la rate.
vikidia-hy Դաղստան (ռուս.՝ Дагестан), գտնվում է Ռուսաստանի Դաշնության եվրոպական մասի հարավ-արևելքում՝ Կասպից ծովի հյուսիսարևմտյան ափին։ Զբաղեցնում է 50.300 կմ² տարածք։ Դաշնության կազմում սահմանակցում է հյուսիսում՝ Կալմիկիայի, արևմուտքում՝ Չեչնիայի, հյուսիս-արևմուտքում՝ Ստավրոպոլի երկրամասի հետ։ Պետական սահման ունի հարավում Ադրբեջանի և հարավ-արևմուտքում Վրաստանի հետ։ Մայրաքաղաքը Մախաչկալան է։ - - -
vikidia-it La Camera dei rappresentanti dell'Alaska è, insieme al Senato, una delle due camere del Parlamento dell'Alaska. La Camera è composta da 40 membri, ognuno dei quali rappresenta un distretto di circa 17756 persone, secondo il censimento del 2010 e i membri svolgono il mandato per due anni, ma senza limiti per il numero di mandati possibili. Con 40 rappresentanti, la Camera dell'Alaska è la più piccola Camera bassa di un Parlamento di Stato negli Stati Uniti d'America. La Camera ha sede nel Campidoglio di Juneau. La Camera dei rappresentanti dell'Alaska è, insieme al Senato, una delle due camere del Parlamento dell'Alaska. La Camera è composta da 40 membri, ognuno di quali rappresenta un distretto di circa 17756 persone, secondo il censimento del 2010 e i membri svolgono il mandato per due anni, ma senza limiti per il numero di mandati possibili. Con 40 rappresentanti, la Camera dell'Alaska è la più piccola Camera bassa di un Parlamento di Stato negli Stati Uniti d'America. La Camera ha sede nel Campidoglio di Juneau. The Alaska House of Representatives is one of the two chambers of the United States House of Representatives. The House of Representatives is made up of 40 members. Each member represents a district of about 17756 people. The members of the House of Representatives are elected for two years, but there is no limit to the number of terms they can serve. The House of Representatives is the smallest chamber of a United States House of Representatives. The House of Representatives is in Juneau, Alaska. La Camera dei rappresentanti dell'Alaska è, insieme al Senato, una delle due camere del Parlamento dell'Alaska. La Camera è composta da 40 membri. Ognuno dei quali rappresenta un distretto di circa 17756 persone, secondo il censimento del 2010. La Camera ha sede nel Campidoglio di Juneau.
vikidia-pt Entre muitas religiões, uma festa ou festival é uma série de celebrações em honra de Deus ou deuses. Uma festa e um festival são historicamente trocáveis ou mutáveis. Contudo, o termo "festa" também introduziu a conversação comum como um sinônimo de qualquer grande refeição ou jantar elaborado. Quando usado como na significação de um festival, muitas vezes refere-se a um festival religioso e não um filme ou festival de arte. Um festival pode tratar de vários assuntos diferentes. Existem também os festivais sazonais, da qual acontecem em diferentes estações do ano, ou de acordo com o calendário lunar e solar. Para os antigos, um festival era sinônimo de entrelaçar famílias e pessoas a fim de se conhecerem, e até mesmo para encontrar um companheiro. Entre muitas religies, uma festa ou festival é uma série de celebraçes em honra de Deus ou deuses. Uma festa e um festival so históricamente trocáveis ou mutáveis. A festival or festival is a series of celebrations in honor of God or deities. A festival or festival are historically trocáveis or mutáveis. However, the term "festa" also introduced a conversation comum as a sinónimo of any great refeiço or jantar elaborado. When used as a meaning of a festival, many times it refers to a festival religious and not a movie or festival of art. A festival can deal with a number of different subjects. There are also sazonais festivals, of which happen in different seasons of the year, or festivals of the year. Uma festa é uma série de festas em honra de Deus ou deuses. Uma festa e um festival so trocáveis ou mutáveis. Muitas vezes refere-se a um festival religioso e no um filme ou festival de arte. Uma festa pode tratar de vários assuntos diferentes. Existem também os festivais sazonais, da qual acontecem em diferentes estaçes. Uma festa pode tratar de vários assuntos diferentes. Existem também os festivais sazonais, da qual acontecem em diferentes estaçes.
vikidia-ru Анри́ Эми́ль Бенуа́ Мати́сс (фр. Henri Émile Benoît Matisse; 31 декабря 1869, Ле-Като-Камбрези — 3 ноября 1954, Ницца) — французский живописец, рисовальщик, гравёр и скульптор. Один из главных европейских художников периода модернизма. Вошел в историю искусства своим стремлением передачи эмоций через форму и цвет. Один из основоположников и ведущих представителей фовизма (вместе с Андре Дереном). нри мил енуа атисс (р. Henri Émile Benoît Matisse; 31 декар 1869, е-ато-амреи — 3 нор 1954, иа) — рануски ивоисе, рисовалк, равр и скултор. дин и лавн евроески удоников ериода модернима. оел в истории искусства своим стремлением ередаи мои ере орму и виду. нри мил енуа атисс (31 декар 1869 – 3 нор 1954) was a French painter. He was one of the most important European artists of the 20th century. He was born in e-aто-амреи, France. Matisse died in Paris, France. нри мил енуа атисс (р. Henri Émile Benoît Matisse; 31 декар 1869, е-ато-амреи — 3 нор 1954, иа) — рануски ивоисе, рисовалик, равр и скултор. дин и лавн евроески удоников ериода модернима. оел в истории искусства своим стремлением ередаи мои ере орму и вет. дин и основоолоников ериода
wikikids-nl Spelt (Triticum spelta) is een eenjarige plant uit de grassenfamilie (Poaceae) uit het geslacht Tarwe (Triticum). Spelt wordt beschouwd als een primitievere variant van de gewone tarwe, die door kruising uit andere soorten is ontstaan. Het is een graansoort die sinds ongeveer 5000 v.Chr. verbouwd wordt. Er komen veel mengvormen met en overgangen naar gewone tarwe voor, omdat spelt gemakkelijk kruist met de in de nabijheid verbouwde tarwesoorten. Spelt (Triticum spelta) is een eenjarige plant uit de grassenfamilie (Poaceae) uit het geslacht Tarwe (Triticum). Spelt wordt beschouwd als een primitievere variant van de gewone tarwe, die door kruising uit anderen types is ontstaan. Het is een graansoort die seit ongeveer 5000 v.Chr. verbouwd wordt. Spelt (Triticum spelta) is een eenjarige plant uit de grassenfamilie (Poaceae) uit het geslacht Tarwe (Triticum). Spelt wordt beschouwd als een primitievere variant van de gewone tarwe, die by kruising uit anderen species is ontstaan. Het is een graansoort that has been made since about 5000 v.Chr. It is a graansoort that has been made since about 5000 v.Chr. Spelt (Triticum spelta) is een eenjarige plant uit de grassenfamilie (Poaceae) uit het geslacht Tarwe (Triticum). Spelt wordt beschouwd als een primitievere variant van de gewone tarwe, die door kruising uit andere soorten is ontstaan. Het is een graansoort die seit ongeveer 5000 v.Chr. verbouwd wordt. Er komen veel mengvormen met en overgangen naar gewone tarwe voor, omdat spelt gemakkelijk kruist met de in de nabijheid verbouwde tarwesoorten. Spelt wordt een graansoort die seit ongeveer 5000 v.Chr. verbouwd wordt
klexikon-de Carl Barks (* 27. März 1901 in der Nähe von Merrill, Oregon; † 25. August 2000 in Grants Pass, Oregon) war ein US-amerikanischer Comicautor und -zeichner sowie Cartoonist und Maler. Er gilt als der bekannteste Disneyzeichner und erschuf zahlreiche Comic-Figuren des Disney-Kosmos wie Dagobert Duck. Aufgrund seiner Comics gilt er auch als Kultur- und Politikkritiker, wenngleich er dies stets abstritt. Carl Barks (* 27. März 1901 in der Nähe von Merrill, Oregon;  25. August 2000 in Grants Pass, Oregon) war ein US-amerikanischer Comicautor und -zeichner sowie Cartoonist und Maler. Carl Barks (* 27. März 1901 in Merrill, Oregon;  25. August 2000 in Grants Pass, Oregon) was an American comic artist and cartoonist. He was known as the most famous Disney artist. He drew many comic figures of the Disney-Kosmos like Dagobert Duck. Carl Barks (* 27. März 1901 in der Nähe von Merrill, Oregon;  25. August 2000 in Grants Pass, Oregon) war ein US-amerikanischer Comicautor und -zeichner sowie Cartoonist und Maler. Er gilt als der bekannteste Disneyzeichner und erschuf zahlreiche Comic-Figuren des Disney-Kosmos wie Dagobert Duck.
txikipedia-eu Literatur generoak zenbait irizpide (teknika, doinua edo edukia) kontuan hartuz literatura-lanak sailkatzeko erabiltzen diren kategoriak dira. Orokorrean, kategoria horiek oso lausoak izan ohi dira, adituak bat ez datozelako. Oso zaila da sailkapen berean historiako genero eta azpigenero guztiak egokitzea. Garai bakoitzak bereak izaten ditu, hau da, Grezia klasikoko generoak (epika, tragedia eta komedia) edo japonierazkoak (haiku etab.) ez dira bateragarriak. Oro har, genero orokorrenak prosa, poesia eta antzerkia izaten dira. Basque har, genero orokorrenak prosa Literatur generoak zenbait irizpide (teknika, doinua edo edukia) kontuan hartuz literatura-lanak sailkatzeko erabiltzen diren kategoriak dira. Orokorrean, kategoria horiek oso lausoak izan ohi dira, adituak bat ez datazelako. Oso zaila da sailkapen berean historiako genero eta azpigenero guztiak egokitzea. Garai bakoitzak bereak izaten ditu, hau da, Grezia klasikoko generoak (epika, tragedia eta komedia) edo japonierazkoak (haiku etab.) ez dira bateragarriak. Oro har, genero orokorrenak prosa, poesia Literatur generoak zenbait irizpide (teknika, doinua edo edukia) kontuan hartuz literatura-lanak sailkatzeko erabiltzen diren kategoriak dira. Orokorrean, kategoria horiek oso lausoak izan ohi dira, adituak bat ez datozelako. Oso zaila da sailkapen berean historiako genero eta azpigenero guztiak egokitzea. Garai bakoitzak bereak izaten ditu, hau da, Grezia klasikoko generoak (epika, tragedia eta komedia) edo japonierazkoak (haiku etab.) ez dira bateragarriak. Oro har, genero orokorrenak prosa


Observations:

  • The plain model (without fine-tuning) does not simplify the texts; instead it will repeat the input sequence and cut off after some number of tokens.
  • The English model (fine-tuned only on the English dataset) will generate simplified texts, however, the main limitation is that for other languages the output will be in English as well
  • The multilingual model (fine-tuned on all languages) will generate simplifications in the correct language.
  • No output for Greek and Armenian

Quantitative

edit

We evaluate the models using the same metrics as above (SARI, D-SARI, BLEU). In addition, we check whether the generated output text is the same language as the output text using LiftWing's Language identification prediction (LANG).

Plain (no fine-tuning) English fine-tuning Multilingual fine-tuning
Dataset N SARI D-SARI BLEU LANG SARI D-SARI BLEU LANG SARI D-SARI BLEU LANG
simplewiki-en 1469 39.85 27.93 11.98 0.95 44.87 33.88 22.02 0.99 45.10 34.39 22.04 0.99
vikidia-en 200 38.07 31.66 5.53 0.91 41.15 29.13 10.56 1.0 41.14 29.82 10.71 1.0
vikidia-ca 24 44.02 30.39 21.12 0.92 43.50 29.82 17.09 0.96 43.64 34.98 16.20 1.0
vikidia-de 26 36.22 25.09 10.07 1.0 31.14 19.46 1.59 0.0 47.12 35.38 18.42 0.96
vikidia-el 5 29.77 25.89 0.00 0.0 30.50 24.01 0.00 0.0 30.65 15.71 0.00 0.0
vikidia-es 192 41.34 29.25 14.21 0.99 39.10 28.59 8.65 0.62 41.74 31.61 12.25 1.0
vikidia-eu 58 39.37 31.78 2.90 0.98 39.99 29.54 5.30 0.9 40.52 29.77 5.76 1.0
vikidia-fr 1223 39.89 27.42 10.15 0.97 37.02 26.71 3.25 0.18 40.29 28.65 10.62 0.95
vikidia-hy 49 28.32 22.49 0.00 0.0 28.71 23.60 0.00 0.0 28.22 20.51 0.00 0.0
vikidia-it 167 41.25 22.62 13.74 0.93 39.52 26.20 8.28 0.58 45.73 37.96 20.01 0.9
vikidia-pt 82 40.76 31.51 10.99 1.0 40.41 31.35 6.62 0.35 41.62 32.97 11.40 1.0
vikidia-ru 13 30.75 24.71 1.15 0.69 29.59 24.46 0.44 0.07 32.18 24.20 2.81 0.53
wikikids-nl 1209 39.86 30.31 6.06 0.93 39.94 30.40 6.02 0.72 40.47 29.24 8.49 0.99
klexikon-de 226 35.63 32.05 0.24 0.96 34.74 31.51 0.22 0.03 45.10 34.39 22.04 1.0
txikipedia-eu 117 38.57 29.19 3.67 0.95 38.90 28.91 6.0 0.94 41.14 29.82 10.71 1.0

Observations:

  • The multilingual model gives the best results for all metrics for most of the datasets.
  • For the multilingual model, the performance across languages varies
    • for Catalan (ca), German (de), Italian (it) the scores are similar to English
    • for Spanish (es), Basque (eu), French (fr), Portuguese (pt), Dutch (nl) the performance is slightly worse than in English
    • for Greek, Armenian, and Russian the performance is substantially worse
  • Some comments about the other models
    • plain (no fine-tuning): SARI scores are usually lower but are surprisingly high. This does not capture very well the actual performance of the model output. When looking at the examples above, we find that in many cases the output repeats the model input with some truncation. Unfortunately, the SARI still yields a high score due to a large number of words that are deleted. This is an artifact of how the score is defined. This demonstrates the need to also consider complementary metrics such as BLEU (which yield much lower scores in this case).
    • English fine-tuning: We see that only fine-tuning in English will not lead to satisfactory results as the output will most likely be in English and not in the language of the input sequence. This demonstrates the need to also use at least a small number of samples to fine-tune in other languages.

Resources:

edit

References

edit
  1. Hyung Won Chung et al. 2022. Scaling Instruction-Finetuned Language Models. arXiv:2210.11416 https://arxiv.org/abs/2210.11416
  2. Xue-Yong Fu et al. 2024. Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? arXiv:2402.00841 https://arxiv.org/abs/2402.00841
  3. Xu, W., Napoles, C., Pavlick, E., Chen, Q., & Callison-Burch, C. (2016). Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4, 401–415. https://doi.org/10.1162/tacl_a_00107
  4. Renliang Sun, Hanqi Jin, and Xiaojun Wan. 2021. Document-Level Text Simplification: Dataset, Criteria and Baseline. November 2021. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 7997–8013. https://doi.org/10.18653/v1/2021.emnlp-main.630