Working with data in Wikimedia and MediaWiki
Working with data in Wikimedia and MediaWiki is a course taught by Niklas Laxström and Susanna Ånäs.
Information
edit- Place
- Language Technology, Department of Modern Languages, University of Helsinki, Helsinki, Finland
- Time
- September-December 2016
- Course info and sign-up
- WebOodi
5.9. Wiki
editSlides
edit- https://docs.google.com/presentation/d/14mkPOAAqKvJEMiSlOI04Cfm3XZhlRDglu8AV5VzX9Rs/edit?usp=sharing
Reading
edit- https://en.wikipedia.org/wiki/Wiki
- https://en.wikipedia.org/wiki/Wiki_software
- http://c2.com/cgi/wiki?WikiEngines (check some wiki engines and see if they are still available)
- http://c2.com/cgi/wiki?WikiDesignPrinciples
- https://meta.wikimedia.org/wiki/Wikimedia_movement
Home assignment
editSubmission: send your written replies to niklas.laxstrom AT helsinki.fi using subject wmw-01 before Monday 12.9.
Content organization
edit- Go to Special:AllPages
- Go over all the non-talk namespaces from the namespace dropdown
- Open some pages from each namespace to see what kind of content there is
What do you think the namespace is used for? Do you see other patterns in the the way pages are named besides the namespace? Is there anything special about the name of the page Special:AllPages itself? Write down your observations and thoughts.
Basic wiki
editI have created an uncustomized wiki installation. Compare it to this wiki and document what differences you see in the appearance and functionality. For example, try editing pages (but don't save anything). You can use Special:Version on this wiki and Special:Version of the uncustomized wiki to compare installed extension to help you find more differences.
12.9. MediaWiki
editSlides
edit- https://docs.google.com/presentation/d/1IvVGgVsaM6UJAjOLIB_lMgtVEMNfS_D4CdZIV5TDaKw/edit?usp=sharing
Reading
edit- https://www.mediawiki.org/wiki/Everything_is_a_wiki_page
- https://www.mediawiki.org/wiki/Help:Watchlist
- https://www.mediawiki.org/wiki/Help:Starting_a_new_page
- https://www.mediawiki.org/wiki/Help:Namespaces
- https://www.mediawiki.org/wiki/Help:Templates
- https://www.mediawiki.org/wiki/Help:Magic_words
Home assignment
edit- Pick two unique names, hereafter called A and B. You can use Special:Random as an inspiration.
- Go to the previously empty wiki of last week's assignment.
- It is not necessary to register in this wiki to create pages. Create page Template:A with contents
This is the _ of the page _
, so that the underscores are replaced with appropriate wikicode: first one should output the content of first unnamed parameter. The second one should output the name of the current page. See the help links in the reading section or the slides. - Create page B with any creative content, such as "Hello world!". Edit the page B again and use the template A twice. Place
{{A|beginning}}
in the beginning and{{A|end}}
at the end of the page and save your edits. - Document how to use the Template:A using <noinclude> tags.
Make sure the page looks okay, for example that the text does not run together. Send the link to page B by email per instructions above.
Submission: send your answers to niklas.laxstrom AT helsinki.fi using subject wmw-02 before Monday 19.9.
19.9. MediaWiki extensions
editSlides
edit- https://docs.google.com/presentation/d/1o6JS99BLKOaBn8LzlzjsQcj8rXCu75vs7GvCnLP30VE/edit?usp=sharing
Reading
editHome assignment
editIf you are sure that you are going to install MediaWiki on your own, you can skip these steps, but do send an email to inform me that you are doing it.
- Familiarize yourself with Wikimedia Labs terms of use
- Create a Wikimedia Labs account (also known as Wikitech account)
- Create a ssh key if necessary and set it up for Wikimedia Labs
Submission: send your account name to niklas.laxstrom AT helsinki.fi using subject wmw-03 before Monday 3.10.
26.9. Wikimedia projects
editSlides
edit- https://docs.google.com/presentation/d/1F2_CkJVPfuZifLNJTQ8L0lQYmfrCtEMCIhgbU2dKKqk/edit?usp=sharing
Reading
edit- https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual
- https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
- https://www.youtube.com/watch?v=1jHoUkj_mKw
Home assignment
edit- Define a list, map or a timeline for a topic. Choose a topic that could illustrate a Wikipedia article.
- Make a Wikidata query that returns all necessary information.
- Include dates, locations (points or areas) and images in the query.
Submission: send a link to your query to niklas.laxstrom AT helsinki.fi using subject wmw-04 before Monday 3.10.
3.10. Lists, maps and timelines
editSlides
edit- https://docs.google.com/presentation/d/1p0w1uNd5EtjBPoC3F0Tm-rYdRHi91Eb9II1Z--UnFM4/edit?usp=sharing
Reading
editHome assignment
edit- Create or polish your map, timeline or list
- Fix the data, if needed
Listeria list
edit- Create a Listeria list in your preferred wiki
- {{Wikidata-lista}} and {{Wikidata-listan loppu}} in Finnish Wikipedia
- {{Wikidata list}} and {{Wikidata list end}} in English Wikipedia
- Example: https://fi.wikipedia.org/wiki/Käyttäjä:Susannaanas/pd2017
- Add parameters
- Use only one SELECT parameter: ?item
- Listeria will take care of language, multiple values, grouping etc.
- Insert the list to your preferred wiki
Histropedia timeline
edit- Create a Histropedia timeline of your SPARQL query
- Add parameters
- Add parameter name without the question mark as title, URL, dates, image etc. Use the textual representation for texts, not the ID.
- You can group items based on one parameter.
- Insert a link to your preferred wiki
Kartographer <maplink> or <mapframe> map with SPARQL query and geoshapes from OpenStreetMap
edit- <maplink> is available on all Wikipedias
- <mapframe> will be enabled later on Wikipedias, but can already be used in mediawiki.org.
- Documentation
- Kartographer help https://www.mediawiki.org/wiki/Help:Extension:Kartographer
- Simple Style specification https://github.com/mapbox/simplestyle-spec/tree/master/1.1.0
Examples
- Where is Finland?
- https://fi.wikipedia.org/wiki/Käyttäjä:Susannaanas/karttakoe
- One or a list of Wikidata items on a map. Shape from OpenStreetMap. Editing style is possible.
- Helsinki neighbourhoods
- https://fi.wikipedia.org/wiki/Käyttäjä:Susannaanas/karttakoe-kaupunginosat
- Shapes are retrieved from OSM based on a SPARQL query in Wikidata. The query can be made to return the style parameters (stroke, stroke-opacity, stroke-width, fill, fill-opacity)
- Municipalities of Finland
- https://fi.wikipedia.org/wiki/Käyttäjä:Susannaanas/karttakoe-kunnat
- All Finnish municipalities coloured by population. Popup heraldry. Shapes from OSM
Home assignment option
- Create a map based on Finnish municipalities or neighbourhoods of Helsinki
- Use parameters ?img, ?title, ?description, ?link and ?fill in your SPARQL query
- All municipality geoshapes are needed for this to work, therefore, take part in talkoot!
- It may take up to 2 days for the geoshapes to appear in the map
Additional talkoot for everyone!
edit- Create an account in OpenStreetMap
- In OSM, add 15 Wikidata IDs to OpenStreetMap features for Finnish municipalities. See this blog post for help.
- Log in
- Go to edit mode
- Search for your municipality
- Select the administrative unit (an area) from the list if there are several options
- Add field: "Wikipedia". Use any language, select the name of the municipality. Wikidata ID follows automatically.
- If Wikipedia article exists, but no Wikidata ID, select the Wikipedia article again, and the Wikidata ID appears.
- Remember to save
- For those who have already completed their first 15 and those who have not started, select your set from the sets below, approx. 15 items :)
Cities | Reserved by | Completed! |
---|---|---|
Akaa–Evijärvi | taken | done |
Finström–Hattula | Kim | done |
Hausjärvi–Iisalmi | taken | done |
Iitti–Joroinen | Virpi | done |
Joutsa–Kangasniemi | taken | done |
Kankaanpää–Keminmaa | Julia | done |
Kemiönsaari–Konnevesi | ||
Kontiolahti–Kyyjärvi | ||
Kärkölä–Leppävirta | Anna | done |
Lestijärvi–Maalahti | Ville | done |
Maarianhamina–Mänttä-Vilppula | Niklas | done |
Mäntyharju–Padasjoki | ||
Paimio–Pirkkala | Sabine | done |
Polvijärvi–Pyhäranta | Kim | done |
Pälkäne–Ruokolahti | Susanna | done |
Ruovesi–Siikajoki | Susanna | done |
Siikalatva–Sysmä | Susanna | done |
Säkylä–Tuusula | Susanna | done |
Tyrnävä–Vesanto | Susanna | done |
Vesilahti–Äänekoski | Susanna | done |
Submission: send a link to your list, timeline or map to niklas.laxstrom AT helsinki.fi using subject wmw-05 before Monday 10.10.
Comments
edit- Data may modeled differently in different cases
- Many ways to deal with duplicates
- Good for education purposes, for visual learners. Specifically history teaching. Also high school level.
10.10. Extracting content
editSlides
edit- https://docs.google.com/presentation/d/1WrREp9cqbLTqfoqHhZNpTnGR5-ulBn8URP-K8rjjdo4/edit?usp=sharing
Reading
edit- https://www.mediawiki.org/wiki/Help:CirrusSearch
- https://meta.wikimedia.org/wiki/Research:Quarry
- https://www.mediawiki.org/wiki/API:Main_page
- https://meta.wikimedia.org/wiki/Data_dumps
- http://pythonhosted.org/mediawiki-utilities/
Home assignment
editProduce a plain text dump of Finnish Wikipedia articles having a name that starts with Abe. Do not include redirects.
Place the extracted text of each article to a separate file named after the article. Remove characters such as (, ), or & that can be problematic in file names. Use UTF-8 encoding.
You can use the database dumps or the API. Use latest version of the article available with your source.
You can use mediawiki-utilities and/or other libraries (for example BeautifulSoup) or programming languages. The goal is to extract sentences from the articles. All wikitext mark-up or HTML mark-up should be removed as much as possible, as well as headings, infoboxes, citations, tables, etc.
If you decide to use the dumps, you can do this exercise on prugna.wmwcourse.eqiad.wmflabs (how to access), where the dump file is under /data and mediawiki-utilities and BeautifulSoup4 is already installed. You need to use python3
command to run your script. Since just iterating the dump takes over 5 minutes, consider splitting your script into two parts: first extract the relevant pages with their content, then clean-up the output.
Write down notes about problematic cases that you encounter. Finally, give an estimate how long it would take to do this kind of dump from all of Finnish Wikipedia.
Submission: send your notes, and script and text files in an archive to niklas.laxstrom AT helsinki.fi using subject wmw-06 before Monday 17.10.
17.10. Wikimania
editThere is no lecture on 17.10.
Wikimania is the largest annual conference of the Wikimedia movement. It has presentations on both technical and social topics and it provides a window to what is happening the movement.
Home assignment
editWatch 2 or 3 presentations from Wikimania 2016 based on your interest. Summarize each presentation and what you learned in a few paragraphs. Be prepared to share highlights with others on the next lecture.
Submission: send your summaries to niklas.laxstrom AT helsinki.fi using subject wmw-07 before Monday 31.10.
24.10. Period break
editThere is no lecture on 24.10.
31.10. Semantic MediaWiki
editSlides
edit- https://docs.google.com/presentation/d/1P9zTnjlAv9oI4ORVO866ZLJZyTXO8NWlR2GihBCXKHU/edit?usp=sharing
Reading
edit- https://en.wikipedia.org/wiki/Semantic_MediaWiki
- https://www.semantic-mediawiki.org/wiki/Help:User_manual
- https://www.mediawiki.org/wiki/MediaWiki-Vagrant
Home assignment
editYou have received name of your Vagrant wiki on the lecture or via email. If you have not, contact Niklas.
- Check that
http://wmwcourse-name.wmflabs.org
has a working wiki. - Connect to your server
name.wmwcourse.eqiad.wmflabs
with ssh. See wikitech:Help:Getting_Started#Project_Instances for how to do this, if you haven't already. - Enable the semanticmediawiki role with
cd /srv/mediawiki-vagrant; vagrant roles enable semanticmediawiki && vagrant provision
. - Log in to your wiki using admin account and change the password.
- Add some pages with semantic annotations to your wiki using the template approach. For example countries and capitals, but feel free to use imagination.
- Create a page with semantic query ({{#ask:}}) that displays some data from those pages. For example countries with their capitals and population in descending order.
Submission: send link to your query page to niklas.laxstrom AT helsinki.fi using subject wmw-08 before Monday 7.11.
7.11. Forms
editSlides
edit- https://docs.google.com/presentation/d/1l9-pdwqL72SizfJ7mn6jgh2jLEWzcvZ1hNFnqvwDJGE/edit?usp=sharing
Reading
editHome assignment
editUse same vagrant wiki as you did last week.
- Check that
http://wmwcourse-name.wmflabs.org
has a working wiki. - Connect to your server
name.wmwcourse.eqiad.wmflabs
with ssh. - Install Page Forms. It does not have a role yet, so we are going to install it manually.
- Go inside your Vagrant virtual machine:
cd /srv/mediawiki-vagrant; vagrant ssh
- Download PageForms extension
cd /vagrant/mediawiki/extensions; git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/PageForms
- Exit the virtual machine:
exit
- Register the extension (you can use your favorite editor) by creating a new settings file:
nano /srv/mediawiki-vagrant/settings.d/20-pageforms.php
with contents:<?php wfLoadExtension( 'PageForms' );
- Check Special:Version of your wiki to confirm it is installed properly.
- Go inside your Vagrant virtual machine:
- Use Special:CreateForm to create a new form
- Edit your form page to better suit your input by selecting input types, possible values etc.
Submission: send link to your form page to niklas.laxstrom AT helsinki.fi using subject wmw-09 before Monday 14.11.
14.11. Translate extension
editSlides
edit- https://docs.google.com/presentation/d/1YtgGjPXh23w4sw3UTFAO0_mKeLucJaI7EyerXJqBfpY/edit?usp=sharing
Reading
edit- https://www.mediawiki.org/wiki/Help:Extension:Translate
- https://www.mediawiki.org/wiki/MediaWiki_Language_Extension_Bundle
- https://www.mediawiki.org/wiki/Multilingual_Semantic_MediaWiki
- https://www.semantic-mediawiki.org/wiki/Localization_and_multilingual_content
Home assignment
editUse same vagrant wiki as you did last week.
- Connect to your server
name.wmwcourse.eqiad.wmflabs
with ssh. - Install MediaWiki Language Extension Bundle. It does have a vagrant role.
- Go inside your Vagrant virtual machine:
cd /srv/mediawiki-vagrant; vagrant roles enable mleb; vagrant provision
- Add some basic configuration:
nano /srv/mediawiki-vagrant/LocalSettings.php
with contents:$wgGroupPermissions['user']['translate'] = true; $wgGroupPermissions['user']['translate-messagereview'] = true; $wgGroupPermissions['user']['translate-groupreview'] = true; $wgGroupPermissions['user']['pagetranslation'] = true; $wgTranslateDocumentationLanguageCode = 'qqq'; $wgExtraLanguageNames['qqq'] = 'Message documentation';
- Check Special:Version of your wiki to confirm it is installed properly.
- Go inside your Vagrant virtual machine:
- Make your query results page and form translatable. You can use either page translation or unstructured element translation. Remember that some form labels do not support {{int}}, so it is okay to leave those untranslated.
- Translate your query results page and form to one language other than English.
Submission: send link to your pages to niklas.laxstrom AT helsinki.fi using subject wmw-10 before Monday 21.11.
21.11. Content Translation & Project work
editSlides
edit- https://docs.google.com/presentation/d/1QdRUK8dIB_fv-8VW4fElgv4fX73NLCzTe1K5XcGVg14/edit?usp=sharing
Reading
editHome assignment
editTry Content Translation
edit- Log in to Wikipedia
- Go to beta features tab in your preferences and enable content translation
- Go to Special:ContentTranslation and do a translation (you don't need to publish)
- Write comments answering the following questions:
- Did you encounter any bugs or issues during translation
- Compare the actual source article and what you see in the translation tool's source column. What differences there are?
- Now that you have tried different kind of translation tools, what are the benefits and downsides of each tool?
- How would you decide which tool to use for different types of content?
- If you decide to publish your translation, include a link to the published page
Choose a data set
editIf you want to do a project work, choose a data set. Refer to the slides for what is available.
Submission: send your answers to your pages to niklas.laxstrom AT helsinki.fi using subject wmw-11 before Monday 28.11.
28.11. Pywikibot and tips for running MediaWiki
editSlides
edit- https://docs.google.com/presentation/d/1QdRUK8dIB_fv-8VW4fElgv4fX73NLCzTe1K5XcGVg14/edit?usp=sharing (Some slides were skipped last week, going over those today)
- https://docs.google.com/presentation/d/10ILYqWNW-AB-DBLQxppR_8e8nQciBncdDkPSw1osz5Q/edit?usp=sharing
Reading
editHome assignment
edit- No home assignment this week.
5.12. Examples on subobjects and custom parser functions
editSlides
edit- https://docs.google.com/presentation/d/14gvuvu1j62jYD0xZEN4ABJNVa59VO8AVGcycOKAPUr8/edit?usp=sharing
Reading
edit- https://www.mediawiki.org/wiki/Manual:Using_custom_namespaces
- https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual
Home assignment
edit- No home assignment this week either.