User:A ka es/OpenRefine/wikimania2019 postersession
Poster Session at #wikimania2019 - empower yourself: first steps
editDescription | File |
---|---|
* Wikimania 2019 - Poster Session * The Magic of OpenRefine |
Installation
editDescription | Screenshot |
---|---|
Sources: Linux kit, Mac kit, Windows kit Documentation for users, Installation Instructions: "... it runs as a small web server on your own computer and you point your web browser at that web server in order to use Refine. So, think of Refine as a personal and private web application." Installation Instructions |
Acquiring Data
editstored at your own computer
editSource for data examples: (the-nerd.be)
Notes: You can open and upload more than one file at the same time: choose more than one (it is easier if the files are in the same file directory at your computer). This is a good process if the data structure in the files is equal.
"flat" data formats like .csv, .tsv, .xls, .xlsx, .odt
editDescription | Screencast |
---|---|
* Startsite OpenRefine * column left: select "Create Project" * select "Get data from - This Computer" * main column: push "Browse..."-button * choose the file from your local directory * push "Next" * process: uploading data => preview * choose the data format (below the columns on the left side; mostly it is detected automatically) * check the options below the columns, try out the best combination, update the preview * if everything fits: name the project, set a tag (fields above the columns) * push the "Create Project"-button on the right side) |
structured data formats like .xml, .json
editDescription | Screencast |
---|---|
* Startsite OpenRefine * column left: select "Create Project" * select "Get data from - This Computer" * main column: push "Browse..."-button * choose the file from your local directory * push "Next" * specify the data path in the preview window (hover at the curly brackets and choose per click, if all needed data are included) * check the preview - if you miss something push the "Please specify a record path first"-button and start again * if everything fits: name the project, set a tag (fields above the columns) * push the "Create Project"-button on the right side) |
special case .html
editDescription |
---|
* open the .html file in a browser * copy the table-structure * paste it in the clipboard (see the next section) |
copy & paste from tables
editDescription | Screencast |
---|---|
* copy a table structure from a source (website, .pdf-file, textfile, spreadsheet e.g.) * Startsite OpenRefine * column left: select "Create Project" * select "Get Data from - Clipboard" * paste the copied table structure in the clipboard window * push the "Next"-button below * process: uploading data => preview * choose the data format (below the columns on the left side; mostly it is detected automatically) * check the options below the columns, try out the best combination, update the preview * if everything fits: name the project, set a tag (fields above the columns) * push the "Create Project"-button on the right side) |
load data via API or URL
editSource for data examples: abgeordnetenwatch.de API parliaments
Notes: You can request more then one URL at the same time: push the "Add Another URL"-button and the next URL. If all URLs are in, push the "Next"-button. This is an good process if you are sure, that the data structure behind the URL is equal.
Description | Screencast |
---|---|
* Startsite OpenRefine * column left: select "Create Project" * select "Get data from - Web Addresses (URLs)" * paste or write the URL in the field * push "Next" * next step depends from the data format: select a data path or check options * if everything fits: name the project, set a tag (fields above the columns) * push the "Create Project"-button on the right side) |
Exploring Data
editDescription | Screencasts |
---|---|
If you have a data project in OpenRefine you can explore and edit the content in many ways; the easiest are facets and filter. | |
You can cluster values to find failures and to correct them. |
Preparing Data
editDescription | Screencast |
---|---|
The file in the example came with the following note: "Brussels phone numbers start with +32(0)228 45; change the 5 to 9 for the fax. Strasbourg phone numbers start with +33(0)388 1 75; again, change the 5 to 9 for the fax." We have to create the fax numbers and we have to delete the "@" in the Twitter user name. |
Combining Data
editDescription | Screencast |
---|---|
There are two OpenRefine-projects: the file from the European Parliament, enriched with the Q-Numbers for the MEPs, and a wikidata query. We want to combine both to know, which MEPs have an parliamentery term-entry in wikidata and where are the gaps. We use the Q-Numbers as key. |
Exporting Data
editDescription | Screencast |
---|---|
You can export your data with one click to many formats: as an OpenRefine-project to share with others, as common spreadsheet-formats or csv/tsv, as html-file. Or you can make your own choice of columns with an exporter. |
"Magic" (Bonus)
editregex
editDescription | Screencast |
---|---|
first impressions | ... (content and screencast are coming soon) |
GREL
editDescription | Screencast |
---|---|
first impressions | ... (content and screencast are coming soon) |
reconcilation services
editDescription | Screencast |
---|---|
first impressions | ... (content and screencast are coming soon) |
"about" section => editing meta data
editDescription | Screencast |
---|---|
If you work with many projects using the meta data and tags to organize them is very useful. If you missed the function: you can do this in the "about" section for every project. |