The project helped us to understand the category taxonomy in Wikipedia projects, including Vietnamese Wikipedia (viwiki) and English Wikipedia (enwiki). We learn that most of the editors of Vietnamese Wikipedia refer to the category taxonomy to English Wikipedia to create new categories and organize them by a hierarchy structure. Any category name can be converted to a pattern that is used to translate to other languages based on the partition translation and then combine results before comparing it to a set of predefined category names to decide the best name.

  • Community online discussion
  • Ideas from a professor
  • Use AWB to restructure category names
  • Create translation tool and perform and improve the translation quality
  • Firstly, I wrote and used my tool (Alphama Category) to scan all Vietnamese categories which have interlinks with English categories at Wikidata. I also organized an online discussion at wiki and a professor's ideas for naming conventions, translations and category standards. Then, I wrote a short summary of this discussion and update these standards. Be sure that you have a very deep conversation with your community (viwiki) about the category standards and ask many members or related stakeholders if you can. I invited more than 1000 editors by message and e-mail but just over 27 members giving me their opinions.
Restructure Vietnamese category names based on new standards
  • From restructured list above, I used AWB and AlphamaBot (talk · contribs) to redirect old categories (Vietnamese colum) to new categories (Vietnamese fixes) and update related children and parent categories. I did not change interlinks at Wikidata because I just received the bot flag at Wikidata recently and still learn how to use pywikibot. I will do this task soon. Many editors will ask you why you change the category names, thus you have to give the change reason in summary box and link you your project. If they have any opinions, that may help the project a lot. There are many triples (categories/pages---belongto---categories/pages) so make sure that you restructure all triples which you have.
  • For the translation, you need to have as much buffer data (temp data) as you can, you can get them directly from Wikidata, Wikipedia or even DBPedia. In my project, I use my program to collect buffer data (even in the running time) because I just want to use which data I really need for the project and filter out the garbage data. Remember that, the capacity of data affects a lot the timing of translation processes. Because I interact with Wikidata, Wikipedia by their APIs so be sure you have a good Internet speed and several laptops/machines (or hire some cloud computers) to help you boosting the execution.
Database Diagram
  • For improving the quality of translation, you have to create a function which allows to change the results, patterns and prefered names of categories. Beside, suggest an similar translation case and the matching score (0-1) between this similar one and your translation.
Name analysis of an English category translated to Vietnamese
  • To manage categories, AlphamaCategory also offers CategoryManagement module.
Category management module

  • Alphama Category tool
Alphama Category 1.0.9

Progress towards stated goals


Planned measure of success
(include numeric target, if applicable)
Actual result Explanation
Update new naming conventions for viwiki We update at page vi:Wikipedia:Thể loại We put the word "năm" (year) into every category related to years to clarify the meaning of the category. For example, Thể loại:Khoa học năm 1990 is used instead of Thể loại:Khoa học 1990. We use categories related countries in Vietnamese instead of using its English names, Australia --> Úc, Italy, Italia --> Ý. In the case of prepositions, we use some like "of, in, by" but by different cases.
Gain the attention of viwiki There are 33 members involve to the discussions to improve the category names in viwiki Since the category taxonomy does not pay attention from the community. This project is a good one for seeking collaboration in viwiki and gain awareness of editors about the needed of category names. Our discussion can be found in this link [2]
To create semi-automatically new categories for Vietnamese Wikipedia, help to reduce the manual tasks related to category classification and to create more fine-grained category taxonomy for Vietnamese Wikipedia Integral the category taxonomy to our bot (AlphamaBot), running by AWB (AutoWiki Browser, a semi-automatically tool). Since then, the category taxonomy in viwiki now is very mature compare to many other projects AlphamaBot can work well now, for new articles in viwiki, it can be translated and put into new articles.

Think back to your overall project goals. Do you feel you achieved your goals? Why or why not? We already archived our goal since then this project help to us update our category taxonomy and created many new category names in viwiki.

Global Metrics


Metric Achieved outcome Explanation
1. Number of active editors involved 33 members They involve to the discussion about category names. A few members are interested to propose some ideas in technical issues.
2. Number of new editors a few members This is a technical project so we only gain some members to work with us in the Bot project [3]
3. Number of individuals involved 33 individuals They involve to the discussion about category names. A few members are interested to propose some ideas in technical issues.
5. Number of articles added or improved on Wikimedia projects about 4000 new categories The list can be found here [4] [5]
6. Absolute value of bytes added to or deleted from Wikimedia projects about 4000 new categories The full list can be found here [6] [7]

Learning question
Did your work increase the motivation of contributors, and how do you know?
  • Yes, since our project, we realize that the number of categories in Vietnamese increasing, from 137k to now is 240k [8].

Indicators of impact


  • We choose option B. Our project helps to improve the quality of category taxonomy in viwiki and may apply for small and medium other projects who need to create and classify many important and needed categories.

Project resources


We learn a lot about the nature of category taxonomy and how to analyze these category names to translate to other languages, such as in our case, Vietnamese. We realize that by using this way we can reduce man efforts of amatuer editors in Wikipedia who are struggling with creating new category names and confused about which one is the better option to translate.

What worked well


The translation works very well with popular categories or categories containing popular terms inside. These terms already appear in the dictionary and in the interlinking structure of Wikidata so our duty just to combine it and check for the best option by our NLP algorithm, the similarity of terms.

What didn’t work


  • For a category name contains terms (words) do not appear in the dictionary. It is very hard and tricky when on the Internet, there are many versions that were translated not from the experts and even experts sometimes argue about the translated names.

Other recommendations


  • We mention about using Deep Learning and other NLP algorithms to improve the translation quality.

Next steps and opportunities


Actual spending


Remaining funds


  • No

  • No. Since we create a project based on technical and online collaborations, we do not know how to grant the receipt.

Confirmation of project status


  • Yes

  • Yes

Grantee reflection


Expense Approved amount Actual funds spent Difference
Translation and collect NLP patterns 1000 USD 1000 USD 0 (compare to the mid-term we gain more 500 USD to hire translators to check the correctness of category names)
Design & building program 4500 USD 4500 USD 0
Hire machines to hang out to retrieve data from Wikipedia by APIs 500 USD 500 USD 0
Support discussions, research solutions (or make some surveys) 1000 USD 1000 USD 0